PCAproj {pcaPP}R Documentation

Robust Principal Components using the algorithm of Croux and Ruiz-Gazen (2005)

Description

Computes a desired number of (robust) principal components using the algorithm of Croux and Ruiz-Gazen (JMVA, 2005).

Usage

PCAproj(x, k = 2, method = c("sd", "mad", "qn"), CalcMethod = c("eachobs",
"lincomb", "sphere"), nmax = 1000, update = TRUE, scores = TRUE, maxit = 5, 
maxhalf = 5, control, ...)

Arguments

x a numeric matrix or data frame which provides the data for the principal components analysis.
k desired number of components to compute
method scale estimator used to detect the direction with the largest variance. Possible values are "sd", "mad" and "qn", the latter can be called "Qn" too. "mad" is the default value.
CalcMethod the variant of the algorithm to be used. Possible values are "eachobs", "lincomb" and "sphere", with "eachobs" being the default.
nmax maximum number of directions to search in each step (only when using "sphere" or "lincomb" as the CalcMethod).
update a logical value indicating whether an update algorithm should be used.
scores a logical value indicating whether the scores of the principal component should be calculated.
maxit maximim number of iterations.
maxhalf maximum number of steps for angle halving.
control a list whose elements must be the same as (or a subset of) the parameters above. If the control object is supplied, the parameters from it will be used and any other given parameters are overridden. The parameter ... is not affected by this parameter though. It cannot be given as an element of the control object and is not ignored if the control object is supplied.
... additional arguments passed to the function ScaleAdvR

Details

Basically, this algrithm considers the directions of each observation through the origin of the centered data as possible projection directions. As this algorithm has some drawbacks, especially if ncol(x) > nrow(x) in the data matrix, there are several improvements that can be used with this algorithm.

update
An updating step basing on the algorithm for finding the eigenvectors is added to the algorithm. This can be used with any CalcMethod
sphere
Additional search directions are added using random directions. The random directions are determined using random data points generated from a p-dimensional multivariate standard normal distribution. These new data points are projected to the unit sphere, giving the new search directions.
lincomb
Additional search directions are added using linear combinations of the observations. It is similar to the "sphere"-algorithm, but the new data points are generated using linear combinations of the original data b_1*x_1 + ... + b_n*x_n where the coefficients b_i come from a uniform distribution in the interval [0, 1].

Similar to the function princomp, there is a print method for the these objects that prints the results in a nice format and the plot method produces a scree plot (screeplot). There is also a biplot method.

Value

The function returns a list of class "princomp", i.e. a list similar to the output of the function princomp.

sdev the (robust) standard deviations of the principal components.
loadings the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). This is of class "loadings": see loadings for its print method.
center the means that were subtracted.
scale the scalings applied to each variable.
n.obs the number of observations.
scores if scores = TRUE, the scores of the supplied data on the principal components.
call the matched call.

Author(s)

Heinrich Fritz, Peter Filzmoser <P.Filzmoser@tuwien.ac.at>

References

C. Croux, P. Filzmoser, M. Oliveira (2004) Projection-pursuit Estimators for Robust Principal Component Analysis, Technical Report TS-04-4, Vienna University of Technology, Austria

See Also

PCAgrid, ScaleAdvR, princomp

Examples

  # multivariate data with outliers
  x <- rbind(rmvnorm(200, rep(0, 6), diag(c(5, rep(1,5)))),
             rmvnorm( 15, c(0, rep(20, 5)), diag(rep(1, 6))))
  # Here we calculate the principal components with PCAgrid
  pc <- PCAproj(x, 6)
  # we could draw a biplot too:
  biplot(pc)

  # we could use another calculation method and another objective function, and 
  # maybe only calculate the first three principal components:
  pc <- PCAproj(x, 3, "qn", "sphere")
  biplot(pc)

  # now we want to compare the results with the non-robust principal components
  pc <- princomp(x)
  # again, a biplot for comparision:
  biplot(pc)

[Package pcaPP version 1.0 Index]