ofw {ofw}R Documentation

Optimal Feature Weighting with CART and SVM

Description

ofw implements a meta algorithm called "Optimal Feature Weighting" for multiclass classification by aggergating either CART or SVM, in the context of continuous variables.

Usage

## Default S3 method:
ofw(x,y,type="CART", ntree= if(type=="CART") 50 else NULL, nforest= 
        if(type=="CART") 100 else NULL, nsvm= if(type=="SVM") 100 else NULL, mtry=5,
        do.trace=FALSE, nstable=25, keep.inbag=if(type=="CART")  FALSE else NULL, 
        keep.forest=if(type=="CART") TRUE else NULL, weight=FALSE, ...)
## S3 method for class 'ofw':
print(x, ...)

Arguments

x A data frame with continuous values (for the print method, an ofw object).
y A response vector given as a factor (classification only).
type Classifier used: either CART or SVM
ntree If CART, number of trees to grow for each iteration (trees aggregation).
nforest If CART, number of iterations to run. This should not be set to too small a number, to ensure the convergence of the algorithm.
nsvm If SVM, number of iterations to run. This should be set to a very large number, to ensure the convergence of the algorithm.
mtry Number of variables sampled according to the weight vector P as candidates for each tree or SVM. This should be small enough to ensure stable results of the algorithm.
do.trace If set to some integer, then current iteration is printed for every do.trace iterations and the number of the first stable variables is output.
nstable Need do.trace set to some integer. Stopping criterion before nforest or nsvms iterations are reached: if the nstable first weighted variables are the same after do.stable iterations, then stop.
keep.inbag If CART, should an n by ntree matrix be returned that keeps track of which samples are ``in-bag'' in which trees (and how many times as it is a sampling with replacement) in the last forest.
keep.forest If CART, and if set to TRUE, the last forest (or last iteration) will be retained in the output object and the getTree function can be used to see how the trees were constructed.
weight Should the weighting procedure be applied ?
... not used currently.

Details

The Optimal Feature Weighting algorithm learns the probability distribution P on all variables. The more useful the variables in the classification task, the heavier their weight. When the CART classifier is used, the trees are aggregated for each iteration (bagging).

Value

An object of class ofw, which is a list with the following components:

call The original call to ofwCART.
type Classifier used.
classes Level attributes of the classes in the y predictor.
mean.error Internal mean error rate vector of length nforest.
prob Probability distribution vector P or weighting vector of length the total number of variables.
list Name of variables and their respective importance weight, sorted by decreasing order.
ntree If CART, number of trees grown for each iteration.
nforest If CART, number of iterations asked in the procedure.
nsvm If SVM, number of iterations asked in the procedure.
maxiter Actual number of iterations performed (different than nforest or nsvm if nstable variables are obtained after do.trace steps).
mtry Number of predictors sampled with P for each tree for each iteration.
do.trace The number of iteration is printed every do.trace and the stopping criterion is tested.
nstable Number of stable variables obtained if maxiter < nforest.
weight If TRUE the weighted procedure was performed.
classWeight If weight = TRUE class weight vector.
sampleWeight If weight = TRUE sample weight vector.
forest If CART, a list that contains the entire last forest of the last iteration; NULL if keep.forest=FALSE.
inbag If keep.inbag = TRUE a n by ntree matrix is returned with ``in-bag'' samples in which trees (and sampled how many times).

Note

The ofw for CART structure has been first largely inspired from the randomForest package which was itself based on Leo Breiman and Adele Cutler's Fortran code. The code now written in C (or R).

The actual implementation of ofw is restrained to classification task with continuous variables. It has been especially developped to deal with p$>>$n data sets, such as microarrays.

Normalisation has first to be performed by the user. For extremely large data sets, a large pre processing is advised to speed up the procedure.

In contrary to CART, the ofw version with SVM does not provide the internal mean error rate.

Author(s)

Kim-Anh L^e Cao Kim-Anh.Le-Cao@toulouse.inra.fr

Patrick Chabrier Patrick.Chabrier@toulouse.inra.fr

Based on Leo Breiman and Adele Cutler Fortran code and randomForest package for ofwCART version.

References

Gadat, S. and Younes, L. (2007), A Stochastic Algorithm for Feature Selection in Pattern Recognition, Journal of Machine Learning 8, 509-548

L^e Cao, K-A., Gonc calves, O., Besse, P. and Gadat, S. (2007), Selection of biologically relevant genes with a wrapper stochastic algorithm Statistical Applications in Genetics and Molecular Biology: Vol. 6: Iss.1, Article 29.

L^e Cao, K-A., Bonnet, A., and Gadat, S., Multiclass classification and gene selection with a stochastic algorithm http://www.lsp.ups-tlse.fr/Recherche/Publications/2007/cao05.html.

See Also

getTree, learn, evaluate.learn

Examples

## On data set "srbct"
data(srbct)
attach(srbct)

##ofwCART
learn.cart <- ofw(srbct, as.factor(class),type="CART", ntree=100, nforest=200, mtry=5)
print(learn.cart)
## Look at variable importance:
learn.cart$prob
## Look at the 10 most important variables:
learn.cart$list[1:10]
## Look if the internal mean error is decreasing w.r.t the number of iterations:
plot(learn.cart$mean.error, type="l")

##ofwSVM
learn.svm <- ofw(srbct, as.factor(class),type="SVM", nsvm=500, mtry=5)
print(learn.svm)
## Look at variable importance:
learn.svm$prob
## Look at the 10 most important variables:
learn.svm$list[1:10]
## to use the do.trace options:
#learn.cart <- ofw(srbct, as.factor(class),type="CART", ntree=100, nforest=200, Mtry=5, do.trace=10, nstable=5)
#learn.svm <- ofw(srbct, as.factor(class),type="SVM", nsvm=500, mtry=5, do.trace=50, nstable=5)
#learn.cart
#learn.svm
detach(srbct)

[Package ofw version 1.0-0 Index]