msc.features.scale {caMassClass} | R Documentation |
Scale features of the data to be used for classification. Scaling factors are extracted from each column/feature of the train data-set and applied to both train and test sets.
msc.features.scale( xtrain, xtest, type = c("min-max", "avr-std", "med-mad"))
xtrain |
A matrix or data frame with train data. Rows contain samples and columns contain features/variables |
xtest |
A matrix or data frame with test data. Rows contain samples and columns contain features/variables |
type |
Following types are recognized |
Many classification algorithms perform better if input data is scaled
beforehand. Some of them perform scaling internally (for example
svm
), but many don't. For some it makes no difference
(for example rpart
or LogitBoost
).
In case xtrain
contains NA
values or infinities all non-finite
numbers are omitted from scaling parameter calculations.
xtrain |
A matrix or data frame with scaled train data. |
xtest |
A matrix or data frame with scaled test data. |
Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com
Used by msc.classifier.test
and msc.features.select
functions.
library(e1071) data(iris) mask = sample.split(iris[,5], SplitRatio=1/4) # very few points to train xtrain = iris[ mask,-5] # use output of sample.split to ... xtest = iris[!mask,-5] # create train and test subsets ytrain = iris[ mask, 5] ytest = iris[!mask, 5] x = msc.features.scale(xtrain, xtest) model = svm(x$xtrain, ytrain, scale=FALSE) print(a <- table(predict(model, x$xtest), ytest) ) model = svm(xtrain, ytrain, scale=FALSE) print(b <- table(predict(model, xtest), ytest) ) stopifnot( sum(diag(a))<sum(diag(b)) )