sda {sda} | R Documentation |
sda
trains a LDA or DDA classifier using Stein-type shrinkage estimation.
predict.sda
performs the corresponding class prediction.
sda(Xtrain, L, diagonal=FALSE, fdr=FALSE, plot.fdr=FALSE, verbose=TRUE) ## S3 method for class 'sda': predict(object, Xtest, feature.idx, verbose=TRUE, ...)
Xtrain |
A matrix containing the training data set. Note that the rows are sample observations and the columns are variables. |
L |
A factor with the class labels of the training samples. |
diagonal |
Chooses between LDA (default, diagonal=FALSE ) and DDA (diagonal=TRUE ). |
fdr |
compute FDR values for each feature. |
plot.fdr |
show plot with estimated FDR values. |
verbose |
Report shrinkage intensities (sda) and number of used features (predict.sda). |
object |
An sda fit object obtained from the function sda . |
Xtest |
A matrix containing the test data set. |
feature.idx |
A vector indicating which features to employ for prediction (if unspecified all features will be used). |
... |
Additional arguments for generic predict. |
In order to train the LDA or DDA classifier, three separate shrinkage estimators are employed:
freqs.shrink
from Hausser and Strimmer (2008),var.shrink
from Opgen-Rhein and Strimmer (2007), invcor.shrink
from Sch"afer and Strimmer (2005). These estimates are plugged into the LDA and DDA discriminant function for prediction. Note that the three corresponding regularization parameters are obtained analytically without resorting to computer intensive resampling.
sda
trains the classifier and returns an sda
object
with the following components needed for the subsequent prediction:
regularization |
a vector containing the three estimated shrinkage intensities, |
prior |
the estimated class frequencies, |
predcoef |
matrix containing the coefficients used for prediction, and |
ranking |
matrix containing the ``correlation-adjusted t scores'' for each feature and group. The overall ranking of a feature is determined by the sum of the squared cat scores across all groups. |
class |
a factor with the most probable class assignment, and |
posterior |
a matrix containing the class posterior probabilities for each test sample. |
Miiika Ahdesm"aki and Korbinian Strimmer (http://strimmerlab.org).
freqs.shrink
,
var.shrink
,
invcor.shrink
.
# load sda library library("sda") # load full Khan et al (2001) data set data(khan2001) dim(khan2001$x) levels(khan2001$y) # create data set containing only the SRBCT samples del.idx = which( khan2001$y == "non-SRBCT" ) srbct.x = khan2001$x[-del.idx,] srbct.y = factor(khan2001$y[-del.idx]) dim(srbct.x) levels(srbct.y) # divide into training and test data train.x = srbct.x[1:63,] train.y = srbct.y[1:63] test.x = srbct.x[64:83,] test.y = srbct.y[64:83] ################################################### # classification with correlation (shrinkage LDA) # ################################################### sda.fit = sda(train.x, train.y) ynew = predict(sda.fit, test.x)$class # using all 2308 features sum(ynew != test.y) # 0 sda.fit$ranking[1:20,] fidx = sda.fit$ranking[1:20,"idx"] ynew = predict(sda.fit, test.x, feature.idx = fidx)$class # using the top 20 features sum(ynew != test.y) # 1 ########################################################### # classification with diagonal covariance (shrinkage DDA) # ########################################################### sda.fit = sda(train.x, train.y, diagonal=TRUE) ynew = predict(sda.fit, test.x)$class # using all 2308 features sum(ynew != test.y) # 4 sda.fit$ranking[1:20,] fidx = sda.fit$ranking[1:20,"idx"] ynew = predict(sda.fit, test.x, feature.idx = fidx)$class # using the top 20 features sum(ynew != test.y) # 2