msc.features.remove {caMassClass}R Documentation

Remove Highly Correlated Features

Description

Remove Highly Correlated Features. The function checks neighbor features looking for highly correlated ones and removes one of them. Used in order to drop dimensionality of the data.

Usage

msc.features.remove(Data, Auc, ccMin=0.9, verbose=FALSE)

Arguments

Data Data containing one sample per row and one feature per column.
Auc A measure of usefulness of each column/feature, used to choose which one of two highly correlated columns to remove. Usually a measure of discrimination power of each feature as measured by colAUC, student t-test or other method. See details.
ccMin Minimum correlation coefficient of "highly correlated" columns.
verbose Boolean flag turns debugging printouts on.

Details

If colAUC was used and there were more than two classes present than Auc is a matrix with multiple measurements for each feature. In such a case Auc = apply(Auc, 2, mean) is run in order to extract a single measure per feature. If other measures are desired, like Auc = apply(Auc, 2, max), than they should be called beforehand.

Value

Vector of column indexes to be kept.

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

See Also

Examples

  # load "Data_IMAC.Rdata" file containing raw MS spectra 'X'  
  if (!file.exists("Data_IMAC.Rdata")) example("msc.project.read")
  load("Data_IMAC.Rdata")
  
  X = t(X[,,1])
  auc = colAUC(X,SampleLabels)
  quantile(auc)
  cidx = msc.features.remove(X, auc, verbose=TRUE)
  Y = X[,cidx]
  stopifnot( dim(Y)==c(20, 3516) )
  stopifnot( abs(mean(auc)-0.64)<0.01 )

[Package caMassClass version 1.6 Index]