ofw {ofw} | R Documentation |
ofw
implements a meta algorithm called "Optimal Feature Weighting" for multiclass classification by aggergating either CART or SVM, in the context of continuous variables.
## Default S3 method: ofw(x,y,type="CART", ntree= if(type=="CART") 50 else NULL, nforest= if(type=="CART") 100 else NULL, nsvm= if(type=="SVM") 100 else NULL, mtry=5, do.trace=FALSE, nstable=25, keep.inbag=if(type=="CART") FALSE else NULL, keep.forest=if(type=="CART") TRUE else NULL, weight=FALSE, ...) ## S3 method for class 'ofw': print(x, ...)
x |
A data frame with continuous values (for the print method, an ofw object). |
y |
A response vector given as a factor (classification only). |
type |
Classifier used: either CART or SVM |
ntree |
If CART , number of trees to grow for each iteration (trees aggregation). |
nforest |
If CART , number of iterations to run. This should not be set to too small a number, to ensure the convergence of the algorithm. |
nsvm |
If SVM , number of iterations to run. This should be set to a very large number, to ensure the convergence of the algorithm. |
mtry |
Number of variables sampled according to the weight vector P as candidates for each tree or SVM. This should be small enough to ensure stable results of the algorithm. |
do.trace |
If set to some integer, then current iteration is printed for every do.trace iterations and the number of the first stable variables is output. |
nstable |
Need do.trace set to some integer. Stopping criterion before nforest or nsvms iterations are reached: if the nstable first weighted variables are the same after do.stable iterations, then stop. |
keep.inbag |
If CART , should an n by ntree matrix be returned that keeps track of which samples are ``in-bag'' in which trees (and how many times as it is a sampling with replacement) in the last forest. |
keep.forest |
If CART , and if set to TRUE , the last forest (or last iteration) will be retained in the output object and the getTree function can be used to see how the trees were constructed. |
weight |
Should the weighting procedure be applied ? |
... |
not used currently. |
The Optimal Feature Weighting algorithm learns the probability distribution P on all variables. The more useful the variables in the classification task, the heavier their weight. When the CART classifier is used, the trees are aggregated for each iteration (bagging).
An object of class ofw
, which is a list with the
following components:
call |
The original call to ofwCART . |
type |
Classifier used. |
classes |
Level attributes of the classes in the y predictor. |
mean.error |
Internal mean error rate vector of length nforest . |
prob |
Probability distribution vector P or weighting vector of length the total number of variables. |
list |
Name of variables and their respective importance weight, sorted by decreasing order. |
ntree |
If CART , number of trees grown for each iteration. |
nforest |
If CART , number of iterations asked in the procedure. |
nsvm |
If SVM , number of iterations asked in the procedure. |
maxiter |
Actual number of iterations performed (different than nforest or nsvm if nstable variables are obtained after do.trace steps). |
mtry |
Number of predictors sampled with P for each tree for each iteration. |
do.trace |
The number of iteration is printed every do.trace and the stopping criterion is tested. |
nstable |
Number of stable variables obtained if maxiter < nforest . |
weight |
If TRUE the weighted procedure was performed. |
classWeight |
If weight = TRUE class weight vector. |
sampleWeight |
If weight = TRUE sample weight vector. |
forest |
If CART , a list that contains the entire last forest of the last iteration; NULL if keep.forest=FALSE . |
inbag |
If keep.inbag = TRUE a n by ntree matrix is returned with ``in-bag'' samples in which trees (and sampled how many times). |
The ofw
for CART
structure has been first largely inspired from the randomForest
package which was itself based on Leo Breiman and Adele Cutler's Fortran code. The code now written in C (or R).
The actual implementation of ofw is restrained to classification task with continuous variables. It has been especially developped to deal with p$>>$n data sets, such as microarrays.
Normalisation has first to be performed by the user. For extremely large data sets, a large pre processing is advised to speed up the procedure.
In contrary to CART
, the ofw version with SVM
does not provide the internal mean error rate.
Kim-Anh L^e Cao Kim-Anh.Le-Cao@toulouse.inra.fr
Patrick Chabrier Patrick.Chabrier@toulouse.inra.fr
Based on Leo Breiman and Adele Cutler Fortran code and randomForest
package for ofwCART
version.
Gadat, S. and Younes, L. (2007), A Stochastic Algorithm for Feature Selection in Pattern Recognition, Journal of Machine Learning 8, 509-548
L^e Cao, K-A., Gonc calves, O., Besse, P. and Gadat, S. (2007), Selection of biologically relevant genes with a wrapper stochastic algorithm Statistical Applications in Genetics and Molecular Biology: Vol. 6: Iss.1, Article 29.
L^e Cao, K-A., Bonnet, A., and Gadat, S., Multiclass classification and gene selection with a stochastic algorithm http://www.lsp.ups-tlse.fr/Recherche/Publications/2007/cao05.html.
getTree
, learn
, evaluate.learn
## On data set "srbct" data(srbct) attach(srbct) ##ofwCART learn.cart <- ofw(srbct, as.factor(class),type="CART", ntree=100, nforest=200, mtry=5) print(learn.cart) ## Look at variable importance: learn.cart$prob ## Look at the 10 most important variables: learn.cart$list[1:10] ## Look if the internal mean error is decreasing w.r.t the number of iterations: plot(learn.cart$mean.error, type="l") ##ofwSVM learn.svm <- ofw(srbct, as.factor(class),type="SVM", nsvm=500, mtry=5) print(learn.svm) ## Look at variable importance: learn.svm$prob ## Look at the 10 most important variables: learn.svm$list[1:10] ## to use the do.trace options: #learn.cart <- ofw(srbct, as.factor(class),type="CART", ntree=100, nforest=200, Mtry=5, do.trace=10, nstable=5) #learn.svm <- ofw(srbct, as.factor(class),type="SVM", nsvm=500, mtry=5, do.trace=50, nstable=5) #learn.cart #learn.svm detach(srbct)