cforest {party} | R Documentation |
An implementation of the random forest and bagging ensemble algorithms utilizing conditional inference trees as base learners.
cforest(formula, data = list(), subset = NULL, weights = NULL, controls = cforest_control(), xtrafo = ptrafo, ytrafo = ptrafo, scores = NULL) varimp(x, mincriterion = 0.0)
formula |
a symbolic description of the model to be fit. |
data |
an data frame containing the variables in the model. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
weights |
an optional vector of weights to be used in the fitting
process. Non-negative integer valued weights are
allowed as well as non-negative real weights.
Observations are sampled (with or without replacement)
according to probabilities weights / sum(weights) .
The fraction of observations to be sampled (without replacement)
is computed based on the sum of the weights if all weights
are integer-valued and based on the number of weights greater zero
else. |
controls |
an object of class ForestControl-class , which can be
obtained using cforest_control . |
xtrafo |
a function to be applied to all input variables.
By default, the ptrafo function is applied. |
ytrafo |
a function to be applied to all response variables.
By default, the ptrafo function is applied. |
scores |
an optional named list of scores to be attached to ordered factors. |
x |
an object as returned by cforest . |
mincriterion |
the value of the test statistic or 1 - p-value that
must be exceeded in order make use of a split.
See ctree_control . |
This implementation of the random forest (and bagging) algorithm differs
from the reference implementation in randomForest
with respect to the base learner used and the aggregation scheme applied.
Conditional inference trees, see ctree
, are fitted to each
of the ntree
(defined via cforest_control
)
bootstrap samples of the learning sample. There are many
hyper parameters that can be controlled, see cforest_control
.
You MUST NOT change anything you don't understand completely.
The aggregation scheme works by averaging observation weights extracted
from each of the ntree
trees and NOT by averaging predictions directly.
See Hothorn et al. (2004) for a description.
Ensembles of conditional inference trees have not yet been extensively
tested, so this routine is meant for the expert user only and its current
state is rather experimental. However, there are some things that can't be
done with randomForest
, for example fitting
forests to censored response variables or to multivariate and
ordered responses.
By default, raw test statitics are maximized and five inputs are randomly examined for possible splits in each node. Note that this implies biased internal variable selection which might affect variable importance measures derived from such a forest.
Function varimp
can be used to compute variable importance measures
similar to those computed by importance
.
An object of class RandomForest-class
.
Leo Breiman (2001). Random Forests. Machine Learning, 45(1), 5–32.
Torsten Hothorn, Berthold Lausen, Axel Benner and Martin Radespiel-Troeger (2004). Bagging Survival Trees. Statistics in Medicine, 23(1), 77–91.
Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette Molinaro and Mark J. van der Laan (2006). Survival Ensembles. Biostatistics, 7(3), 355–373.
### honest (i.e., out-of-bag) cross-classification of ### true vs. predicted classes table(mammoexp$ME, predict(cforest(ME ~ ., data = mammoexp, control = cforest_control(ntree = 50)), OOB = TRUE)) ### fit forest to censored response if (require("ipred")) { data("GBSG2", package = "ipred") bst <- cforest(Surv(time, cens) ~ ., data = GBSG2, control = cforest_control(ntree = 50)) ### estimate conditional Kaplan-Meier curves treeresponse(bst, newdata = GBSG2[1:2,], OOB = TRUE) ### if you can't resist to look at individual trees ... party:::prettytree(bst@ensemble[[1]], names(bst@data@get("input"))) }