outliersInCompositions {compositions} | R Documentation |
The Philosophy behind outlier treatment in library(compositions).
Outliers are omnipresent in all kinds of data analysis. To avoid catastrophic misinterpreations robust statistics has developed some methods to avoid the distracting influence of the outliers. The introduction of robust methods into the compositions package is described in robustnessInCompositions.
However sometimes we are interested directly in the analysis of outliers. The central philosophy of the the outlier classification subsystem in compositions is that outlier are in most cases not simply erroneous observations, but rather products of some systematic anomality. This can e.g. be an error in an individual component, a secondary process or a minor undetected but different subpopulation. The package provides various concepts to investigate possible reasons for outliers in compositional datasets.
OutlierClassifier1(...,type="outlier")
,
outlierplot(...,type=c("scatter","biplot"),class.type="outlier")
, qMaxMahalanobis(...)
.
OutlierClassifier1(...,type="grade")
,
outlierplot(...,type=c("scatter","biplot"),class.type="grade")
, qEmpiricalMahalanobis(...)
OutlierClassifier1(...,type=c("best","type","all"))
.
outlierplot(...,type="portions")
),
to prove the existence of such outliers, to give a lower bound
for there number and to provide us with suspects, with an associated
outlyingness
probability. See outlierplot(...,type="portions")
,
outlierplot(...,type="nout")
, pQuantileMahalanobis(...)
ClusterFinder1
.
take<-OutlierClassifier1(data,type="grade")!="ok"
hc<-hclust(dist(normalize(acomp(scale(data)[take,]))),method="compact")
and to plot by a command like:
plot(hc)
and plot(acomp(data[take,]),col=cutree(hc,1.5))
The package robustbase is required for using the robust estimations and the outlier subsystem of compositions. To simplify installation it is not listed as required, but it will be loaded, whenever any sort of outlierdetection or robust estimation is used.
K.Gerald v.d. Boogaart http://www.stat.boogaart.de
K. Gerald van den Boogaart, Raimon Tolosana-Delgado, Matevz-Bren (2009) Robustness, classification and visualization of outliers in compositional data, in prep.
compositions-package, missingsInCompositions,
robustnessInCompositions, outliersInCompositions,
outlierplot
,
OutlierClassifier1
, ClusterFinder1
## Not run: # To slow tmp<-set.seed(1400) A <- matrix(c(0.1,0.2,0.3,0.1),nrow=2) Mvar <- 0.1*ilrvar2clr(A%*%t(A)) Mcenter <- acomp(c(1,2,1)) typicalData <- rnorm.acomp(100,Mcenter,Mvar) # main population colnames(typicalData)<-c("A","B","C") data1 <- acomp(rnorm.acomp(100,Mcenter,Mvar)) data2 <- acomp(rbind(typicalData+rbinom(100,1,p=0.1)*rnorm(100)*acomp(c(4,1,1)))) data3 <- acomp(rbind(typicalData,acomp(c(0.5,1.5,2)))) colnames(data3)<-colnames(typicalData) tmp<-set.seed(30) rcauchy.acomp <- function (n, mean, var){ D <- gsi.getD(mean)-1 perturbe(ilrInv(matrix(rnorm(n*D)/rep(rnorm(n),D), ncol = D) %*% chol(clrvar2ilr(var))), mean) } data4 <- acomp(rcauchy.acomp(100,acomp(c(1,2,1)),Mvar/4)) colnames(data4)<-colnames(typicalData) data5 <- acomp(rbind(unclass(typicalData)+outer(rbinom(100,1,p=0.1)*runif(100),c(0.1,1,2)))) data6 <- acomp(rbind(typicalData,rnorm.acomp(20,acomp(c(4,4,1)),Mvar))) datas <- list(data1=data1,data2=data2,data3=data3,data4=data4,data5=data5,data6=data6) tmp <-c() opar<-par(mfrow=c(2,3),pch=19,mar=c(3,2,2,1)) tmp<-mapply(function(x,y) { outlierplot(x,type="scatter",class.type="grade"); title(y) },datas,names(datas)) par(mfrow=c(2,3),pch=19,mar=c(3,2,2,1)) tmp<-mapply(function(x,y) { myCls2 <- OutlierClassifier1(x,alpha=0.05,type="all",corrected=TRUE) outlierplot(x,type="scatter",classifier=OutlierClassifier1,class.type="best", Legend=legend(1,1,levels(myCls),xjust=1,col=colcode,pch=pchcode), pch=as.numeric(myCls2)); legend(0,1,legend=levels(myCls2),pch=1:length(levels(myCls2))) title(y) },datas,names(datas)) par(mfrow=c(2,3),pch=19,mar=c(3,2,2,1)) for( i in 1:length(datas) ) outlierplot(datas[[i]],type="ecdf",main=names(datas)[i]) par(mfrow=c(2,3),pch=19,mar=c(3,2,2,1)) for( i in 1:length(datas) ) outlierplot(datas[[i]],type="portion",main=names(datas)[i]) par(mfrow=c(2,3),pch=19,mar=c(3,2,2,1)) for( i in 1:length(datas) ) outlierplot(datas[[i]],type="nout",main=names(datas)[i]) par(opar) moreData <- acomp(rbind(data3,data5,data6)) take<-OutlierClassifier1(moreData,type="grade")!="ok" hc<-hclust(dist(normalize(acomp(scale(moreData)[take,]))),method="complete") plot(hc) plot(acomp(moreData[take,]),col=cutree(hc,1.5)) ## End(Not run)