msc.classifier.run {caMassClass}R Documentation

Train and Test Chosen Classifier.

Description

Common interface for training and testing several standard classifiers. Includes feature selection and feature scaling steps. Allows to specify that some test samples are multiple copies of the same sample, and should return the same label.

Usage

msc.classifier.run( xtrain, ytrain, xtest, ret.prob=FALSE, 
    RemCorrCol=0, KeepCol=0, prior=1, same.sample=NULL,
    ScaleType=c("none", "min-max", "avr-std", "med-mad"),
    method=c("svm", "nnet", "lda", "qda", "LogitBoost", "rpart"), ...) 

Arguments

xtrain A matrix or data frame with training data. Rows contain samples and columns contain features/variables
ytrain Class labels for the training data samples. A response vector with one label for each row/component of x. Can be either a factor, string or a numeric vector.
xtest A matrix or data frame with test data. Rows contain samples and columns contain features/variables
ret.prob if set to TRUE than the a-posterior probabilities for each class are returned as attribute called "probabilities".
same.sample optional parameter which allows to specify that some (or all) test samples have multiple copies which should be used to predict a single label for all of them. Can be either a factor, string or a numeric vector, with unique values for different samples and identical values for copies of the same sample.
RemCorrCol If non-zero than some of the highly correlated columns are removed using msc.features.remove function with ccMin=RemCorrCol.
KeepCol If non-zero than columns with low AUC are removed.
  • if KeepCol smaller than 0.5 - do nothing
  • if KeepCol in between [0.5, 1] - keep columns with AUC bigger than KeepCol
  • if KeepCol bigger than one - keep top "KeepCol" number of columns
ScaleType Optional parameter, if provided than following types are recognized
  • "none" - no scaling is performed
  • "min-max" - data minimum is mapped to 0 and maximum is mapped to 1
  • "avr-std" - data is mapped to zero mean and unit variance
  • "med-mad" - data is mapped to zero median and unit mad (median absolute deviation)
prior class weights. following types are recognized
  • prior==1 - all samples in all classes have equal weight (default)
  • prior==2 - all classes have equal weight
  • prior is a vector - a named vector of weights for the different classes, used for asymmetric class sizes.
method classifier to be used. Following ones are recognized (followed by some parameters that could be passed through ... :
  • "svm" - see svm from e1071 package. Possible parameters: cost, gamma
  • "nnet" - see nnet from nnet package. Possible parameters: size, decay, maxit
  • "LogitBoost" - see LogitBoost from caTools package Possible parameter: nIter
  • "lda" - see lda from MASS package. Possible parameters: method
  • "qda" - see qda from MASS package. Possible parameters: method
  • "rpart" - see rpart from rpart package. Possible parameters: minsplit, cp, maxdepth
... Additional parameters to be passed to classifiers. See method for suggestions.

Details

This function performs the following steps:

Value

Predicted class labels for each sample in xtest. If ret.prob=TRUE than the a-posterior probabilities of each sample belonging to each class are returned as attribute called "probabilities". The returned probabilities do not take into account same.sample variable, used to synchronize predicted labels.

Note

This function is not fully tested and might be changed in future versions

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

References

See Also

Examples

  data(iris)
  mask  = sample.split(iris[,5], SplitRatio=1/4) # very few points to train
  xtrain = iris[ mask,-5]  # use output of sample.split to ...
  xtest  = iris[!mask,-5]  # create train and test subsets
  ytrain = iris[ mask, 5] 
  ytest  = iris[!mask, 5] 
  table(ytrain, msc.classifier.run(xtrain,ytrain,xtrain, method="svm") )
  table(ytrain, msc.classifier.run(xtrain,ytrain,xtrain, method="nnet") )
  table(ytrain, msc.classifier.run(xtrain,ytrain,xtrain, method="lda") )
  table(ytrain, msc.classifier.run(xtrain,ytrain,xtrain, method="qda") )
  table(ytrain, msc.classifier.run(xtrain,ytrain,xtrain, method="LogitBoost") )
  
  a=table(ytrain, msc.classifier.run(xtrain,ytrain,xtrain, method="LogitBoost") )
  stopifnot(  sum(diag(a))==length(ytrain) )

[Package caMassClass version 1.6 Index]