msc.features.scale {caMassClass}R Documentation

Scale Classification Data

Description

Scale features of the data to be used for classification. Scaling factors are extracted from each column/feature of the train data-set and applied to both train and test sets.

Usage

msc.features.scale( xtrain, xtest, type = c("min-max", "avr-std", "med-mad"))

Arguments

xtrain A matrix or data frame with train data. Rows contain samples and columns contain features/variables
xtest A matrix or data frame with test data. Rows contain samples and columns contain features/variables
type Following types are recognized
  • "min-max" - data minimum is mapped to 0 and maximum is mapped to 1
  • "avr-std" - data is mapped to zero mean and unit variance
  • "med-mad" - data is mapped to zero median and unit mad (median absolute deviation)

Details

Many classification algorithms perform better if input data is scaled beforehand. Some of them perform scaling internally (for example svm), but many don't. For some it makes no difference (for example rpart or LogitBoost).

In case xtrain contains NA values or infinities all non-finite numbers are omitted from scaling parameter calculations.

Value

xtrain A matrix or data frame with scaled train data.
xtest A matrix or data frame with scaled test data.

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

See Also

Used by msc.classifier.test and msc.features.select functions.

Examples

  library(e1071)
  data(iris)
  mask   = sample.split(iris[,5], SplitRatio=1/4) # very few points to train
  xtrain = iris[ mask,-5]  # use output of sample.split to ...
  xtest  = iris[!mask,-5]  # create train and test subsets
  ytrain = iris[ mask, 5] 
  ytest  = iris[!mask, 5] 
  x = msc.features.scale(xtrain, xtest)  
  model = svm(x$xtrain, ytrain, scale=FALSE)
  print(a <- table(predict(model, x$xtest), ytest) )
  model = svm(xtrain, ytrain, scale=FALSE)
  print(b <- table(predict(model, xtest), ytest) )
  stopifnot( sum(diag(a))<sum(diag(b)) )

[Package caMassClass version 1.6 Index]