prm_dcv {chemometrics}R Documentation

Repeated double-cross-validation for robust PLS

Description

Performs a careful evaluation by repeated double-CV for robust PLS, called PRM (partial robust M-estimation).

Usage

prm_dcv(X,Y,ncomp=10,repl=10,segments0=4,segments=7,segment0.type="random",
  segment.type="random",sdfact=2,fairct=4,trim=0.2,opt="median",plot.opt=FALSE, ...) 

Arguments

X predictor matrix
Y response variable
ncomp number of PLS components
repl Number of replicattion for the double-CV
segments0 the number of segments to use for splitting into training and test data, or a list with segments (see mvrCv)
segments the number of segments to use for selecting the optimal number if components, or a list with segments (see mvrCv)
segment0.type the type of segments to use. Ignored if 'segments0' is a list
segment.type the type of segments to use. Ignored if 'segments' is a list
sdfact factor for the multiplication of the standard deviation for the determination of the optimal number of components, see mvr_dcv
fairct tuning constant, by default fairct=4
trim trimming percentage for the computation of the SEP
opt if "l1m" the mean centering is done by the l1-median, otherwise if "median", by the coordinate-wise median
plot.opt if TRUE a plot will be generated that shows the selection of the optimal number of components for each step of the CV
... additional parameters

Details

In this cross-validation (CV) scheme, the optimal number of components is determined by an additional CV in the training set, and applied to the test set. The procedure is repeated repl times. The optimal number of components is the model with the smallest number of components which is still in the range of the MSE+sdfact*sd(MSE), where MSE and sd are taken from the minimum.

Value

b estimated regression coefficients
resopt array [nrow(Y) x ncol(Y) x repl] with residuals using optimum number of components
predopt array [nrow(Y) x ncol(Y) x repl] with predicted Y using optimum number of components
optcomp matrix [segments0 x repl] optimum number of components for each training set
residcomp array [nrow(Y) x ncomp x repl] with residuals using optimum number of components
pred array [nrow(Y) x ncol(Y) x ncomp x repl] with predicted Y for all numbers of components
SEPopt SEP over all residuals using optimal number of components
afinal final optimal number of components
SEPfinal vector of length ncomp with final SEP values; use the element afinal for the optimal SEP
SEPtrim final trimmed SEP value

Author(s)

Peter Filzmoser <P.Filzmoser@tuwien.ac.at>

References

K. Varmuza and P. Filzmoser: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL, 2009.

See Also

mvr

Examples

require(pls)
data(yarn)
res <- prm_dcv(yarn$NIR,yarn$density,ncomp=3,repl=2)

[Package chemometrics version 0.6 Index]