gamboost {mboost}R Documentation

Gradient Boosting with Smooth Components

Description

Gradient boosting for optimizing arbitrary loss functions, where component-wise smoothing procedures are utilized as base learners.

Usage

## S3 method for class 'formula':
gamboost(formula, data = list(), weights = NULL, 
        na.action = na.omit, ...)
## S3 method for class 'matrix':
gamboost(x, y, weights = NULL, ...)
gamboost_fit(object, baselearner = c("bbs", "bss", "bols", 
             "bns", "btree"), dfbase = 4, family = GaussReg(), 
             control = boost_control(), weights = NULL)
## S3 method for class 'gamboost':
plot(x, which = NULL, ask = TRUE && dev.interactive(), 
    type = "b", ylab = expression(f[partial]), add_rug = TRUE, ...)

Arguments

formula a symbolic description of the model to be fit.
data a data frame containing the variables in the model.
weights an optional vector of weights to be used in the fitting process.
na.action a function which indicates what should happen when the data contain NAs.
x design matrix (for gamboost.matrix) or an object returned by gamboost to be plotted via plot.
y vector of responses.
object an object of class boost_data, see boost_dpp.
baselearner a character specifying the component-wise base learner to be used: bss means smoothing splines (see Buhlmann and Yu 2003), bbs P-splines with a B-spline basis (see Schmid and Hothorn 2007), bns P-splines with a natural spline basis, bols linear models, bspatial bivariate tensor product penalized splines, and brandom random effects. In addition, btree boosts stumps. Component-wise smoothing splines have been considered in Buhlmann and Yu (2003) and Schmid and Hothorn (2007) investigate P-splines with a B-spline basis. Kneib, Hothorn and Tutz (2007) also utilise P-splines with a B-spline basis, supplement them with their bivariate tensor product version to estimate interaction surfaces and spatial effects and also consider random effects base learners.
dfbase an integer vector giving the degrees of freedom for the smoothing spline, either globally for all variables (when its length is one) or separately for each single covariate.
family an object of class boost_family-class, implementing the negative gradient corresponding to the loss function to be optimized, by default, squared error loss for continuous responses is used.
control an object of class boost_control.
which if a subset of the plots is required, specify a subset of the variables. Only selected variables are plotted by default.
ask logical; if TRUE, the user is asked before each plot, see par(ask=.).
type what type of plot should be drawn: see plot.
ylab a title for the y axis: see title.
add_rug logical; if TRUE, rugs are added.
... additional arguments passed to callies.

Details

A (generalized) additive model is fitted using a boosting algorithm based on component-wise univariate base learners. The base learner can either be specified via the formula object or via the baselearner argument (see bbs for an example). If the base learners specified in formula differ from baselearner, the latter argument will be ignored.

The function gamboost_fit provides access to the fitting procedure without data pre-processing, e.g. for cross-validation.

Note that penalized B-splines instead of smoothing splines are used as default baselearners as of version 1.1-0.

Value

An object of class gamboost with print, AIC and predict methods being available.

References

Peter Buhlmann and Bin Yu (2003), Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98, 324–339.

Peter Buhlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.

Thomas Kneib, Torsten Hothorn and Gerhard Tutz (2009), Variable selection and model choice in geoadditive regression models. Biometrics, accepted. http://epub.ub.uni-muenchen.de/2063/

Matthias Schmid and Torsten Hothorn (2009), Boosting additive models using component-wise P-splines as base-learners. Computational Statistics & Data Analysis, accepted. http://epub.ub.uni-muenchen.de/2057/

Examples


    ### a simple two-dimensional example: cars data
    cars.gb <- gamboost(dist ~ speed, data = cars, dfbase = 4, 
                        control = boost_control(mstop = 50))
    cars.gb
    AIC(cars.gb, method = "corrected")

    ### plot fit for mstop = 1, ..., 50
    plot(dist ~ speed, data = cars)    
    tmp <- sapply(1:mstop(AIC(cars.gb)), function(i)
        lines(cars$speed, predict(cars.gb[i]), col = "red"))          
    lines(cars$speed, predict(smooth.spline(cars$speed, cars$dist),
                              cars$speed)$y, col = "green")

    ### artificial example: sinus transformation
    x <- sort(runif(100)) * 10
    y <- sin(x) + rnorm(length(x), sd = 0.25)
    plot(x, y)
    ### linear model
    lines(x, fitted(lm(y ~ sin(x) - 1)), col = "red")
    ### GAM
    lines(x, fitted(gamboost(y ~ x - 1, 
                    control = boost_control(mstop = 500))), 
          col = "green")


[Package mboost version 1.1-0 Index]