pls.regression {plsgenomics} | R Documentation |
The function pls.regression
performs pls multivariate regression (with several response variables and
several predictor variables) using de Jong's SIMPLS algorithm. This function
is an adaptation of R. Wehrens' code from the package pls.pcr.
pls.regression(Xtrain, Ytrain, Xtest=NULL, ncomp=NULL, unit.weights=TRUE)
Xtrain |
a (ntrain x p) data matrix of predictors. Xtrain may be a matrix or a
data frame. Each row corresponds to an observation and each column to a predictor variable. |
Ytrain |
a (ntrain x m) data matrix of responses. Ytrain may be a vector (if m=1), a matrix or a data frame.
If Ytrain is a matrix or a data
frame, each row corresponds to an observation and each column to a response variable. If
Ytrain is a vector, it contains the unique response variable for each
observation. |
Xtest |
a (ntest x p) matrix containing the predictors for the test data
set. Xtest may also be a
vector of length p (corresponding to only one test observation). |
ncomp |
the number of latent components to be used for regression. If
ncomp is a vector of integers, the regression model is built
successively with each number of components. If ncomp=NULL , the maximal
number of components min(ntrain,p) is chosen. |
unit.weights |
if TRUE then the latent components
will be constructed from weight vectors that are standardized to length 1,
otherwise the weight vectors do not have length 1 but the latent components have
norm 1. |
The columns of the data matrices Xtrain
and Ytrain
must not be centered to have mean
zero, since centering is performed by the function pls.regression
as a preliminary step
before the SIMPLS algorithm is run.
In the original definition of SIMPLS by de Jong (1993), the weight vectors have length 1. If the weight vectors are standardized to have length 1, they satisfy a simple optimality criterion (de Jong, 1993). However, it is also usual (and computationally efficient) to standardize the latent components to have length 1.
In contrast to the original version found in the package pls.pcr
,
the prediction for the observations from Xtest
is performed after
centering the columns of Xtest
by substracting the columns means
calculated from Xtrain
.
A list with the following components:
B |
the (p x m x length(ncomp )) matrix containing the regression coefficients. Each row corresponds
to a predictor variable and each column to a response variable. The third
dimension of the matrix B corresponds to the number of PLS components used
to compute the regression coefficients. If ncomp has length 1, B
is just a (p x m) matrix. |
Ypred |
the (ntest x m x length(ncomp )) containing the predicted
values of the response variables for the observations from Xtest . The
third dimension of the matrix Ypred corresponds to the number of PLS
components used to compute the regression coefficients. |
P |
the (p x max(ncomp )) matrix containing the X-loadings. |
Q |
the (m x max(ncomp )) matrix containing the Y-loadings. |
T |
the (ntrain x max(ncomp )) matrix containing the X-scores (latent components) |
R |
the (p x max(ncomp )) matrix containing the weights used to construct the
latent components. |
meanX |
the p-vector containing the means of the columns of Xtrain . |
Anne-Laure Boulesteix (http://www.slcmsr.net/boulesteix) and Korbinian Strimmer (http://strimmerlab.org/).
Adapted in part from pls.pcr code by R. Wehrens (http://cran.r-project.org/src/contrib/Descriptions/pls.html).
S. de Jong (1993). SIMPLS: an alternative approach to partial least squares regression, Chemometrics Intell. Lab. Syst. 18, 251–263.
C. J. F. ter Braak and S. de Jong (1993). The objective function of partial least squares regression, Journal of Chemometrics 12, 41–54.
pls.lda
, TFA.estimate
,
pls.regression.cv
.
# load plsgenomics library library(plsgenomics) # load the Ecoli data data(Ecoli) # perform pls regression # with unit latent components pls.regression(Xtrain=Ecoli$CONNECdata,Ytrain=Ecoli$GEdata,Xtest=Ecoli$CONNECdata,ncomp=1:3,unit.weights=FALSE) # with unit weight vectors pls.regression(Xtrain=Ecoli$CONNECdata,Ytrain=Ecoli$GEdata,Xtest=Ecoli$CONNECdata,ncomp=1:3,unit.weights=TRUE)