gsim.cv {plsgenomics} | R Documentation |
The function gsim.cv
determines the best ridge regularization parameter and bandwidth to be
used for classification with GSIM as described in Lambert-Lacroix and Peyre (2005).
gsim.cv(Xtrain, Ytrain,LambdaRange,hARange,hB=NULL, NbIterMax=50)
Xtrain |
a (ntrain x p) data matrix of predictors. Xtrain must be a matrix.
Each row corresponds to an observation and each column to a predictor variable. |
Ytrain |
a ntrain vector of responses. Ytrain must be a vector.
Ytrain is a {1,2}-valued vector and contains the response variable for each
observation. |
LambdaRange |
the vector of positive real value from which the best ridge regularization parameter has to be chosen by cross-validation. |
hARange |
the vector of strictly positive real value from which the best bandwidth has to be chosen by cross-validation for GSIM step A. |
hB |
a strictly positive real value. hB is the bandwidth for
GSIM step B. if hB is equal to NULL, then hB value is chosen using a
plug-in method. |
NbIterMax |
a positive integer. NbIterMax is the maximal number of
iterations in the Newton-Rapson parts. |
The cross-validation procedure described in Lambert-Lacroix and Peyre (2005)
is used to determine the best ridge regularization parameter and bandwidth to be
used for classification with GSIM for binary data (for categorical data see
mgsim
and mgsim.cv
).
At each cross-validation run, Xtrain
is split into a pseudo training
set (ntrain - 1 samples) and a pseudo test set (1 sample) and the classification error rate is determined for each
value of ridge regularization parameter and bandwidth. Finally, the function
gsim.cv
returns the values of the ridge regularization parameter and
bandwidth for which the mean classification error rate is minimal.
A list with the following components:
Lambda |
the optimal regularization parameter. |
hA |
the optimal bandwidth parameter. |
Sophie Lambert-Lacroix (http://www-lmc.imag.fr/lmc-sms/Sophie.Lambert) and Julie Peyre (http://www-lmc.imag.fr/lmc-sms/Julie.Peyre/).
S. Lambert-Lacroix, J. Peyre . (2006) Local likelyhood regression in generalized linear single-index models with applications to microarrays data. Computational Statistics and Data Analysis, vol 51, n 3, 2091-2113.
# load plsgenomics library library(plsgenomics) # load Colon data data(Colon) IndexLearn <- c(sample(which(Colon$Y==2),12),sample(which(Colon$Y==1),8)) Xtrain <- Colon$X[IndexLearn,] Ytrain <- Colon$Y[IndexLearn] Xtest <- Colon$X[-IndexLearn,] # preprocess data resP <- preprocess(Xtrain= Xtrain, Xtest=Xtest,Threshold = c(100,16000),Filtering=c(5,500),log10.scale=TRUE,row.stand=TRUE) # Determine optimum h and lambda hl <- gsim.cv(Xtrain=resP$pXtrain,Ytrain=Ytrain,hARange=c(7,20),LambdaRange=c(0.1,1),hB=NULL) # perform prediction by GSIM res <- gsim(Xtrain=resP$pXtrain,Ytrain=Ytrain,Xtest=resP$pXtest,Lambda=hl$Lambda,hA=hl$hA,hB=NULL) res$Cvg sum(res$Ytest!=Colon$Y[-IndexLearn])