sda.ranking {sda}R Documentation

Shrinkage Discriminant Analysis 1: Feature Ranking

Description

sda.ranking determines a ranking of features by computing cat scores between the group centroids and the pooled mean.

plot.sda.ranking provides a graphical visualization of the top ranking features..

Usage

sda.ranking(Xtrain, L, diagonal=FALSE, fdr=TRUE, plot.fdr=FALSE, verbose=TRUE)
## S3 method for class 'sda.ranking':
plot(x, top=40, ...)

Arguments

Xtrain A matrix containing the training data set. Note that the rows correspond to observations and the columns to variables.
L A factor with the class labels of the training samples.
diagonal Chooses between LDA (default, diagonal=FALSE) and DDA (diagonal=TRUE).
fdr compute FDR values and HC scores for each feature.
plot.fdr Show plot with estimated FDR values.
verbose Print out some info while computing.
x An "sda.ranking" object – this is produced by the sda.ranking() function.
top The number of top-ranking features shown in the plot (default: 40).
... Additional arguments for generic plot.

Details

For each feature and centroid a shrinkage cat scores of the mean versus the pooled mean is computed. The overall ranking of a feature is determined by the sum of the squared cat scores across all centroids. For the diagonal case (LDA) the cat score reduce to the t-score. Thus in the two-class diagonal case the feature are simply ranked according to the (shrinkage) t-scores.

Calling sda.ranking should be step 1 in a classification analysis. Steps 2 and 3 are sda and predict.sda

See Ahdesm"aki and Strimmer (2009) for details. For the case of two classes see Zuber and Strimmer (2009).

Value

sda.ranking returns a matrix with the follwing columns:

idx original feature number
score sum of the squared cat scores - this determines the overall ranking
cat for each group and feature the cat score of the centroid versus the pooled mean


If fdr=TRUE then additionally local false discovery rate (FDR) values as well as higher criticism (HC) scores are computed for each feature (using fdrtool).

Author(s)

Miiika Ahdesm"aki and Korbinian Strimmer (http://strimmerlab.org).

References

Ahdesm"aki, A., and K. Strimmer. 2009. Feature selection in "omics" prediction problems using cat scores and false non-discovery rate control. See http://arxiv.org/abs/0903.2003 for publication details.

Zuber, V., and K. Strimmer. 2009. Gene ranking and biomarker discovery under correlation. See http://arxiv.org/abs/0902.0751 for publication details.

See Also

sda, predict.sda.

Examples

# load sda library
library("sda")

################# 
# training data #
#################

# prostate cancer set
data(singh2002)

# training data
Xtrain = singh2002$x
Ytrain = singh2002$y

######################################### 
# feature ranking (diagonal covariance) #
#########################################

# ranking using t-scores (DDA)
ranking.DDA = sda.ranking(Xtrain, Ytrain, diagonal=TRUE)
ranking.DDA[1:10,]

# plot t-scores for the top 40 genes
plot(ranking.DDA, top=40) 

# number of features with local FDR < 0.8 
# (i.e. features useful for prediction)
sum(ranking.DDA[,"lfdr"] < 0.8)

# number of features with local FDR < 0.2 
# (i.e. significant non-null features)
sum(ranking.DDA[,"lfdr"] < 0.2)

# optimal feature set according to HC score
plot(ranking.DDA[,"HC"], type="l")
which.max( ranking.DDA[1:1000,"HC"] ) 

##################################### 
# feature ranking (full covariance) #
#####################################

# ranking using cat-scores (LDA)
ranking.LDA = sda.ranking(Xtrain, Ytrain, diagonal=FALSE)
ranking.LDA[1:10,]

# plot t-scores for the top 40 genes
plot(ranking.LDA, top=40) 

# number of features with local FDR < 0.8 
# (i.e. features useful for prediction)
sum(ranking.LDA[,"lfdr"] < 0.8)

# number of features with local FDR < 0.2 
# (i.e. significant non-null features)
sum(ranking.LDA[,"lfdr"] < 0.2)

# optimal feature set according to HC score
plot(ranking.LDA[,"HC"], type="l")
which.max( ranking.LDA[1:1000,"HC"] ) 


[Package sda version 1.1.0 Index]