CauRuimet {PTAk} | R Documentation |
Gives a robust estimate of an unknown within group covariance, aiming either to look for dense groups or to sparse groups (outliers) according to local variance and weighting function choice.
CauRuimet(Z,ker=1,m0=1,withingroup=TRUE, loc=substitute(apply(Z,2,mean,trim=.1)),matrixmethod=TRUE, Nrandom=3000)
Z |
matrix |
ker |
either numerical or a function:
if numerical the weighting function is e^{(-ker ;t)}, otherwise
ker=function(t){return(expression)} is a positive decreasing function. |
m0 |
is a graph of neighbourhood or another proximity matrix, the hadamard product of the proximities will be operated |
withingroup |
logical,if TRUE the aim is to give a robust estimate for dense groups, if FALSE the
aim is to give a robust estimate for outliers |
loc |
a vector of locations or a function using mean, median, to give an estimate of it |
matrixmethod |
if TRUE (only with withingroup ) uses some matrix computation rather
than double looping as suggests the formula below |
Nrandom |
if Nrandom < dim(Z)[1] ) uses only a Nrandom sample from rows of Z and
m0 if applicable. |
When withingroup is TRUE
, local(defined by the weighting) variance formula is returned, aiming
at finding dense groups:
W_l=frac{sum_{i=1}^{n-1}sum_{j=i+1}^n m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))(Z_i-Z_j)'(Z_i-Z_j)}{sum_{i=1}^{n-1}sum_{j=i+1}^n m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))}
where d^2_{S^-}( . , .) is the squared euclidian distance with
S^- the inverse of a robust sample covariance (i.e. using loc
instead of the mean) ;
if FALSE
robust Total weighted variance or if m0
not 1 Global weighted variance, is returned:
W_o=frac{sum_{i=1}^nker(d^2_{S^-}(Z_i,tilde{Z}))(Z_i-tilde{Z})'(Z_i-tilde{Z})} {sum_{i=1}^n ker(d^2_{S^-}(Z_i,tilde{Z}))}
W_g=frac{sum_{i=1}^{n-1}sum_{j=i+1}^n m0_{ij}.ker(d^2_{S^-}(Z_i,Z_j))(Z_i-tilde{Z})'(Z_j-tilde{Z})} {sum_{i=1}^{n-1}sum_{j=i+1}^n m0_{ij}ker(d^2_{S^-}(Z_i,Z_j))}
where tilde{Z} is the vector loc
.
If m0
is a graph of neighbourhood and ker is the function returning 1 (no proximity due to
distance is used) the function will return (when withingroup=TRUE
) the local
variance-covariance matrix as define in Lebart(1969).
a matrix
As mentioned by Caussinus and Ruiz a good strategy to reveal dense groups with generalised PCA
would be to reveal outliers first using the metric W_o^{-1} and remove them before using the
metric W_l^{-1}. Based on theoretical considerations they recommand for the choice of
ker
, with the decreasing function e^{(-ker ;t)}: a lower bound of 1 if
withingroup
and something fairly small say in the interval [0.05;0.3] otherwise.
Didier Leibovici c3s2i@free.fr
Caussinus, H and Ruiz, A (1990) Interesting Projections of Multidimensional Data by Means of Generalized Principal Components Analysis. COMPSTAT90, Physica-Verlag, Heidelberg,121-126.
Faraj, A (1994) Interpretation tools for Generalized Discriminant Analysis.In: New Approches in Classification and Data Analysis, Springer-Verlag, 286-291, Heidelberg.
Lebart, L (1969) Analyse statistique de la contiguit<e9>e.Publication de l'Institut de Statistiques Universitaire de Paris, XVIII,81-112.
Leibovici D (2008) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes: the R package PTAk . to be submitted soon at Journal of Statisticcal Software.
data(iris) iris2 <- as.matrix(iris[,1:4]) dimnames(iris2)[[1]] <- as.character(iris[,5]) D2 <- CauRuimet(iris2,ker=1,withingroup=TRUE) D2 <- Powmat(D2,(-1)) iris2 <- sweep(iris2,2,apply(iris2,2,mean)) res <- SVDgen(iris2,D2=D2,D1=1) plot(res,nb1=1,nb2=2,cex=0.5) summary(res,testvar=0) # the same in a demo function # demo.CauRuimet(ker=4,withingroup=TRUE,openX11s=FALSE) # demo.Cauruimet(ker=0.15,withingroup=FALSE,openX11s=FALSE)