sda.ranking {sda} | R Documentation |
sda.ranking
determines a ranking of features by computing cat scores
between the group centroids and the pooled mean.
plot.sda.ranking
provides a graphical visualization of the top ranking features..
sda.ranking(Xtrain, L, diagonal=FALSE, fdr=TRUE, plot.fdr=FALSE, verbose=TRUE) ## S3 method for class 'sda.ranking': plot(x, top=40, ...)
Xtrain |
A matrix containing the training data set. Note that the rows correspond to observations and the columns to variables. |
L |
A factor with the class labels of the training samples. |
diagonal |
Chooses between LDA (default, diagonal=FALSE ) and DDA (diagonal=TRUE ). |
fdr |
compute FDR values and HC scores for each feature. |
plot.fdr |
Show plot with estimated FDR values. |
verbose |
Print out some info while computing. |
x |
An "sda.ranking" object – this is produced by the sda.ranking() function. |
top |
The number of top-ranking features shown in the plot (default: 40). |
... |
Additional arguments for generic plot. |
For each feature and centroid a shrinkage cat scores of the mean versus the pooled mean is computed. The overall ranking of a feature is determined by the sum of the squared cat scores across all centroids. For the diagonal case (LDA) the cat score reduce to the t-score. Thus in the two-class diagonal case the feature are simply ranked according to the (shrinkage) t-scores.
Calling sda.ranking
should be step 1 in a classification analysis. Steps 2 and 3 are
sda
and predict.sda
See Ahdesm"aki and Strimmer (2009) for details. For the case of two classes see Zuber and Strimmer (2009).
sda.ranking
returns a matrix with the follwing columns:
idx |
original feature number |
score |
sum of the squared cat scores - this determines the overall ranking |
cat |
for each group and feature the cat score of the centroid versus the pooled mean |
If fdr=TRUE
then additionally local false discovery rate (FDR) values
as well as higher criticism (HC) scores are computed for each feature
(using fdrtool
).
Miiika Ahdesm"aki and Korbinian Strimmer (http://strimmerlab.org).
Ahdesm"aki, A., and K. Strimmer. 2009. Feature selection in "omics" prediction problems using cat scores and false non-discovery rate control. See http://arxiv.org/abs/0903.2003 for publication details.
Zuber, V., and K. Strimmer. 2009. Gene ranking and biomarker discovery under correlation. See http://arxiv.org/abs/0902.0751 for publication details.
# load sda library library("sda") ################# # training data # ################# # prostate cancer set data(singh2002) # training data Xtrain = singh2002$x Ytrain = singh2002$y ######################################### # feature ranking (diagonal covariance) # ######################################### # ranking using t-scores (DDA) ranking.DDA = sda.ranking(Xtrain, Ytrain, diagonal=TRUE) ranking.DDA[1:10,] # plot t-scores for the top 40 genes plot(ranking.DDA, top=40) # number of features with local FDR < 0.8 # (i.e. features useful for prediction) sum(ranking.DDA[,"lfdr"] < 0.8) # number of features with local FDR < 0.2 # (i.e. significant non-null features) sum(ranking.DDA[,"lfdr"] < 0.2) # optimal feature set according to HC score plot(ranking.DDA[,"HC"], type="l") which.max( ranking.DDA[1:1000,"HC"] ) ##################################### # feature ranking (full covariance) # ##################################### # ranking using cat-scores (LDA) ranking.LDA = sda.ranking(Xtrain, Ytrain, diagonal=FALSE) ranking.LDA[1:10,] # plot t-scores for the top 40 genes plot(ranking.LDA, top=40) # number of features with local FDR < 0.8 # (i.e. features useful for prediction) sum(ranking.LDA[,"lfdr"] < 0.8) # number of features with local FDR < 0.2 # (i.e. significant non-null features) sum(ranking.LDA[,"lfdr"] < 0.2) # optimal feature set according to HC score plot(ranking.LDA[,"HC"], type="l") which.max( ranking.LDA[1:1000,"HC"] )