rfeControl {caret} | R Documentation |
This function generates a control object that can be used to specify the details of the feature selection algorithms used in this package.
rfeControl(functions = NULL, rerank = FALSE, method = "boot", saveDetails = FALSE, number = ifelse(method == "cv", 10, 25), verbose = TRUE, returnResamp = "all", p = .75, index = NULL, workers = 1, computeFunction = lapply, computeArgs = NULL)
functions |
a list of functions for model fitting, prediction and variable importance (see Details below) |
rerank |
a logical: should variable importance be re-calculated each time features are removed? |
method |
The external resampling method: boot , cv ,
LOOCV or LGOCV (for repeated training/test splits |
number |
Either the number of folds or number of resampling iterations |
saveDetails |
a logical to save the predictions and variable importances from the selection process |
verbose |
a logical to print a log for each external resampling iteration |
returnResamp |
A character string indicating how much of the resampled summary metrics should be saved. Values can be ``final'', ``all'' or ``none'' |
p |
For leave-group out cross-validation: the training percentage |
index |
a list with elements for each external resampling iteration. Each list element is the sample rows used for training at that iteration. |
workers |
an integer that specifies how many machines/processors will be used |
computeFunction |
a function that is lapply or emulates lapply . It must have arguments X , FUN and ... . computeFunction can be used to build models in parallel. See the examples in rfe . |
computeArgs |
Extra arguments to pass into the ... slore in computeFunction . See the examples in rfe . |
Backwards selection requires function to be specified for some operations.
The fit
function builds the model based on the current data set. The arguments for the function must be:
x
y
first
last
first
, but TRUE
when the last model is fit with the final subset size and
predictors....
rfe
The function should return a model object that can be used to generate predictions.
The pred
function returns a vector of predictions (numeric or factors) from the current model. The arguments are:
object
fit
functionx
The rank
function is used to return the predictors in the order of the most important to the least important. Inputs are:
object
fit
functionx
y
The function should return a data frame with a column called vars
that has the current variable names. The first row should be the most important predictor etc. Other columns can be included in the output and will be returned in the final rfe
object.
The selectSize
function determines the optimal number of predictors based on the resampling output. Inputs for the function are:
x
Variables
"metric
maximize
This function should return an integer corresponding to the optimal subset size. caret comes with two examples functions for this purpose: pickSizeBest
and pickSizeTolerance
.
After the optimal subset size is determined, the selectVar
function will be used to calculate the best rankings for each variable across all the resampling iterations. Inputs for the function are:
y
rank
function). In the example,
each each of the cross–validation groups the output of
the rank
function is saved for each of the
subset sizes (including the original subset). If the
rankings are not recomputed at each iteration, the
values will be the same within each cross-validation
iteration.size
selectSize
function
This function should return a character string of predictor names (of length size
) in the order of most important to least important
Examples of these functions are included in the package: lmFuncs
, rfFuncs
, treebagFuncs
and nbFuncs
.
Model details about these functions, including examples, are in the package vignette for feature selection.
A list
Max Kuhn
rfe
, lmFuncs
, rfFuncs
, treebagFuncs
, nbFuncs
, pickSizeBest
, pickSizeTolerance