blackboost {mboost}R Documentation

Gradient Boosting with Regression Trees

Description

Gradient boosting for optimizing arbitrary loss functions where regression trees are utilized as base learners.

Usage

## S3 method for class 'formula':
blackboost(formula, data = list(), weights = NULL, ...)
## S3 method for class 'matrix':
blackboost(x, y, weights = NULL, ...)
blackboost_fit(object, tree_controls = 
    ctree_control(teststat = "max",
                  testtype = "Teststatistic",
                  mincriterion = 0,
                  maxdepth = 2),
    fitmem = ctree_memory(object, TRUE), family = GaussReg(), 
    control = boost_control(), weights = NULL)

Arguments

formula a symbolic description of the model to be fit.
data a data frame containing the variables in the model.
weights an optional vector of weights to be used in the fitting process.
x design matrix.
y vector of responses.
object an object of class boost_data, see boost_dpp.
tree_controls an object of class TreeControl, which can be obtained using ctree_control. Defines hyper-parameters for the trees which are used as base learners. It is wise to make sure to understand the consequences of altering any of its arguments.
fitmem an object of class TreeFitMemory.
family an object of class boost_family-class, implementing the negative gradient corresponding to the loss function to be optimized. By default, squared error loss for continuous responses is used.
control an object of class boost_control which defines the hyper-parameters of the boosting algorithm.
... additional arguments passed to callies.

Details

This function implements the `classical' gradient boosting utilizing regression trees as base learners. Essentially, the same algorithm is implemented in package gbm. The main difference is that arbitrary loss functions to be optimized can be specified via the family argument to blackboost whereas gbm uses hard-coded loss functions. Moreover, the base learners (conditional inference trees, see ctree) are a little bit more flexible.

The regression fit is a black box prediction machine and thus hardly interpretable.

Usually, the formula based interface blackboost should be used. When necessary (for example for cross-validation), function blackboost_fit operating on objects of class boost_data is faster alternative.

Value

An object of class blackboost with print and predict methods being available.

References

Jerome H. Friedman (2001), Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232.

Greg Ridgeway (1999), The state of boosting. Computing Science and Statistics, 31, 172–181.

Peter Buhlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.

Torsten Hothorn, Kurt Hornik and Achim Zeileis (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.

Examples


    ### a simple two-dimensional example: cars data
    cars.gb <- blackboost(dist ~ speed, data = cars,
                          control = boost_control(mstop = 50))
    cars.gb

    ### plot fit
    plot(dist ~ speed, data = cars)
    lines(cars$speed, predict(cars.gb), col = "red")


[Package mboost version 1.1-0 Index]