PCAproj {pcaPP} | R Documentation |
Computes a desired number of (robust) principal components using the algorithm of Croux and Ruiz-Gazen (JMVA, 2005).
PCAproj(x, k = 2, method = c("mad", "sd", "qn"), CalcMethod = c("eachobs", "lincomb", "sphere"), nmax = 1000, update = TRUE, scores = TRUE, maxit = 5, maxhalf = 5, scale = NULL, center = l1median, control)
x |
a numeric matrix or data frame which provides the data for the principal components analysis. |
k |
desired number of components to compute |
method |
scale estimator used to detect the direction with the largest
variance. Possible values are "sd" , "mad" and "qn" , the
latter can be called "Qn" too. "mad" is the default value. |
CalcMethod |
the variant of the algorithm to be used. Possible values are
"eachobs" , "lincomb" and "sphere" , with "eachobs" being
the default. |
nmax |
maximum number of directions to search in each step (only when
using "sphere" or "lincomb" as the CalcMethod ). |
update |
a logical value indicating whether an update algorithm should be used. |
scores |
a logical value indicating whether the scores of the principal component should be calculated. |
maxit |
maximim number of iterations. |
maxhalf |
maximum number of steps for angle halving. |
scale |
this argument indicates how the data is to be rescaled. It
can be a function like sd or mad or a vector
of length ncol(x) containing the scale value of each column. |
center |
this argument indicates how the data is to be centered. It
can be a function like mean or median or a vector
of length ncol(x) containing the center value of each column. |
control |
a list whose elements must be the same as (or a subset of) the parameters above. If the control object is supplied, the parameters from it will be used and any other given parameters are overridden. |
Basically, this algrithm considers the directions of each observation
through the origin of the centered data as possible projection directions.
As this algorithm has some drawbacks, especially if ncol(x) > nrow(x)
in the data matrix, there are several improvements that can be used with this
algorithm.
CalcMethod
"sphere"
-algorithm, but the new data points are generated using linear
combinations of the original data b_1*x_1 + ... + b_n*x_n
where the
coefficients b_i
come from a uniform distribution in the interval
[0, 1]
.
Similar to the function princomp
, there is a print
method
for the these objects that prints the results in a nice format and the plot
method produces a scree plot (screeplot
). There is also a
biplot
method.
The function returns a list of class "princomp"
, i.e. a list similar to the
output of the function princomp
.
sdev |
the (robust) standard deviations of the principal components. |
loadings |
the matrix of variable loadings (i.e., a matrix whose columns
contain the eigenvectors). This is of class "loadings" :
see loadings for its print method. |
center |
the means that were subtracted. |
scale |
the scalings applied to each variable. |
n.obs |
the number of observations. |
scores |
if scores = TRUE , the scores of the supplied data on the
principal components. |
call |
the matched call. |
Heinrich Fritz, Peter Filzmoser <P.Filzmoser@tuwien.ac.at>
C. Croux, P. Filzmoser, M. Oliveira, (2007). Algorithms for Projection-Pursuit Robust Principal Component Analysis, Chemometrics and Intelligent Laboratory Systems, Vol. 87, pp. 218-225.
# multivariate data with outliers library(mvtnorm) x <- rbind(rmvnorm(200, rep(0, 6), diag(c(5, rep(1,5)))), rmvnorm( 15, c(0, rep(20, 5)), diag(rep(1, 6)))) # Here we calculate the principal components with PCAgrid pc <- PCAproj(x, 6) # we could draw a biplot too: biplot(pc) # we could use another calculation method and another objective function, and # maybe only calculate the first three principal components: pc <- PCAproj(x, 3, "qn", "sphere") biplot(pc) # now we want to compare the results with the non-robust principal components pc <- princomp(x) # again, a biplot for comparision: biplot(pc)