msc.features.remove {caMassClass} | R Documentation |
Remove Highly Correlated Features. The function checks neighbor features looking for highly correlated ones and removes one of them. Used in order to drop dimensionality of the data.
msc.features.remove(Data, Auc, ccMin=0.9, verbose=FALSE)
Data |
Data containing one sample per row and one feature per column. |
Auc |
A measure of usefulness of each column/feature, used to choose
which one of two highly correlated columns to remove. Usually a measure of
discrimination power of each feature as measured by
colAUC , student t-test or other method. See details. |
ccMin |
Minimum correlation coefficient of "highly correlated" columns. |
verbose |
Boolean flag turns debugging printouts on. |
If colAUC
was used and there were more than two classes present
than Auc
is a matrix with multiple measurements for each feature. In
such a case Auc = apply(Auc, 2, mean)
is run in order to extract a single
measure per feature. If other measures are desired, like
Auc = apply(Auc, 2, max)
, than they should be called beforehand.
Vector of column indexes to be kept.
Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com
msc.classifier.test
and msc.features.select
functions.
colAUC
# load "Data_IMAC.Rdata" file containing raw MS spectra 'X' if (!file.exists("Data_IMAC.Rdata")) example("msc.project.read") load("Data_IMAC.Rdata") X = t(X[,,1]) auc = colAUC(X,SampleLabels) quantile(auc) cidx = msc.features.remove(X, auc, verbose=TRUE) Y = X[,cidx] stopifnot( dim(Y)==c(20, 3516) ) stopifnot( abs(mean(auc)-0.64)<0.01 )