CauRuimet {PTAk}R Documentation

Robust estimation of within group covariance

Description

Gives a robust estimate of an unknown within group covariance, aiming either to look for dense groups or to sparse groups (outliers) according to local variance and weighting function choice.

Usage

 CauRuimet(Z,ker=1,m0=1,withingroup=TRUE,
              loc=substitute(apply(Z,2,mean,trim=.1)),matrixmethod=TRUE)

        

Arguments

Z matrix
ker either numerical or a function: if numerical the weighting function is e^{(-ker ;t)}, otherwise
ker=function(t){return(expression)} is a positive decreasing function.
m0 is a graph of neighbourhood or another proximity matrix, the hadamard product of the proximities will be operated
withingroup logical,if TRUE the aim is to give a robust estimate for dense groups, if FALSE the aim is to give a robust estimate for outliers
loc a vector of locations or a function using mean, median, to give an estimate of it
matrixmethod if TRUE (only with withingroup) uses some matrix computation rather than double looping as suggests the formula below

Details

When withingroup is TRUE, local(defined by the weighting) variance formula is used, aiming at finding dense groups:

W_g=frac{sum_{i=1}^{n-1}sum_{j=i+1}^n ker(d^2_{S^-}(Z_i,Z_j))(Z_i-Z_j)'(Z_i-Z_j)}{sum_{i=1}^{n-1}sum_{j=i+1}^n ker(d^2_{S^-}(Z_i,Z_j))}

where d^2_{S^-}( . , .) is the squared euclidian distance with S^- the inverse of a robust sample covariance (i.e. using loc instead of the mean) ; if FALSE weighted global variance is used:

W_o=frac{sum_{i=1}^nker(d^2_{S^-}(Z_i,tilde{Z}))(Z_i-tilde{Z})'(Z_i-tilde{Z})} {sum_{i=1}^n ker(d^2_{S^-}(Z_i,tilde{Z}))}

where tilde{Z} is the vector loc.
If m0 is a graph of neighbourhood and ker is the function returning 1 (no proximity due to distance is used) the function will return (when withingroup=TRUE) the local variance-covariance matrix as define in Lebart(1969).

Value

a matrix

Note

As mentioned by Caussinus and Ruiz a good strategy to reveal dense groups with generalised PCA would be to reveal outliers first using the metric W_o^{-1} and remove them before using the metric W_g^{-1}. Based on theoretical considerations they recommand for the choice of ker, with the decreasing function e^{(-ker ;t)}: a lower bound of 1 if withingroup and something fairly small say in the interval [0.05;0.3] otherwise.

Author(s)

Didier Leibovici c3s2i@free.fr

References

Caussinus, H and Ruiz, A (1990) Interesting Projections of Multidimensional Data by Means of Generalized Principal Components Analysis. COMPSTAT90, Physica-Verlag, Heidelberg,121-126.

Faraj, A (1994) Interpretation tools for Generalized Discriminant Analysis.In: New Approches in Classification and Data Analysis, Springer-Verlag, 286-291, Heidelberg.

Lebart, L (1969) Analyse statistique de la contiguitée.Publication de l'Institut de Statistiques Universitaire de Paris, XVIII,81-112.

See Also

SVDgen

Examples


 data(iris)
  iris2 <- as.matrix(iris[,1:4])
  dimnames(iris2)[[1]] <- as.character(iris[,5])

 D2 <- CauRuimet(iris2,ker=1,withingroup=TRUE)
 D2 <- Powmat(D2,(-1))
 iris2 <- sweep(iris2,2,apply(iris2,2,mean))
 res <- SVDgen(iris2,D2=D2,D1=1)
 plot(res,nb1=1,nb2=2,cex=0.5)
 summary(res,testvar=0)

 # the same in a demo function

 # demo.CauRuimet(ker=4,withingroup=TRUE,openX11s=FALSE)
 # demo.Cauruimet(ker=0.15,withingroup=FALSE,openX11s=FALSE)

[Package Contents]