spls {integrOmics} | R Documentation |
Function to perform sparse Partial Least Squares (sPLS). The sPLS approach combines both integration and variable selection simultaneously on two data sets in a one-step strategy.
spls(X, Y, ncomp = 2, mode = c("regression", "canonical"), max.iter = 500, tol = 1e-06, keepX = c(rep(ncol(X), ncomp)), keepY = c(rep(ncol(Y), ncomp)), scaleY = TRUE)
X |
numeric matrix of predictors. NA s are allowed. |
Y |
numeric vector or matrix of responses (for multi-response models).
NA s are allowed. |
ncomp |
the number of components to include in the model (see Details).
Default is set to from one to the rank of X . |
mode |
character string. What type of algorithm to use, (partially) matching
one of "regression" or "canonical" . See Details. |
max.iter |
integer, the maximum number of iterations. |
tol |
a positive real, the tolerance used in the iterative algorithm. |
keepX |
numeric vector of length ncomp , the number of variables
to keep in X-loadings. By default all variables are kept in the model. |
keepY |
numeric vector of length ncomp , the number of variables
to keep in Y-loadings. By default all variables are kept in the model. |
scaleY |
should the Y data be scaled ? In the case of a 'discriminant' version of the sPLS
where the Y data are of discrete type, this should be set to FALSE . |
... |
not used currently. |
spls
function fit sPLS models with 1, ... ,ncomp
components.
Multi-response models are fully supported. The X
and Y
datasets
can contain missing values.
The type of algorithm to use is specified with the mode
argument. Two sPLS
algorithms are a-vai-lable: sPLS regression ("regression")
and sPLS canonical analysis
("canonical")
(see References).
The number of components to fit is specified with the argument ncomp
.
It this is not supplied, the rank of X
is used. The rank is compute by
using the mat.rank
function.
spls
returns an object of class "spls"
, a list
that contains the following components:
X |
the centered and standardized original predictor matrix. |
Y |
the centered and standardized original response vector or matrix. |
ncomp |
the number of components included in the model. |
mode |
the algorithm used to fit the model. |
keepX |
number of X variables kept in the model on each component. |
keepY |
number of Y variables kept in the model on each component. |
mat.c |
matrix of coefficients to be used internally by predict . |
variates |
list containing the variates. |
loadings |
list containing the estimated loadings for the X and
Y variates. |
names |
list containing the names to be used for individuals and variables. |
Sébastien Déjean, Ignacio González and Kim-Anh Lê Cao.
Lê Cao, K.-A., Martin, P.G.P., Robert-Granié, C. and Besse, P. (2009). Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10:34.
Lê Cao, K.-A., Rossouw, D., Robert-Granié, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.
Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015-1034.
Tenenhaus, M. (1998). La régression PLS: théorie et pratique. Paris: Editions Technic.
Wold H. (1966). Estimation of principal components and related models by iterative least squares. In: Krishnaiah, P. R. (editors), Multivariate Analysis. Academic Press, N.Y., 391-420.
pls
, summary
, mat.rank
,
plotIndiv
, plotVar
.
data(liver.toxicity) X <- liver.toxicity$gene Y <- liver.toxicity$clinic toxicity.spls <- spls(X, Y, ncomp = 3, keepX = c(50, 50, 50), keepY = c(10, 10, 10))