gamboost {mboost}R Documentation

Gradient Boosting with Component-wise Smoothing Splines

Description

Gradient boosting for optimizing arbitrary loss functions where component-wise smoothing splines are utilized as base learners.

Usage

## S3 method for class 'formula':
gamboost(formula, data = list(), weights = NULL, ...)
## S3 method for class 'matrix':
gamboost(x, y, weights = NULL, ...)
gamboost_fit(object, baselearner = c("ssp", "bsp", "ols"), 
             dfbase = 4, family = GaussReg(), 
             control = boost_control(), weights = NULL)

Arguments

formula a symbolic description of the model to be fit.
data a data frame containing the variables in the model.
weights an optional vector of weights to be used in the fitting process.
x design matrix.
y vector of responses.
object an object of class boost_data, see boost_dpp.
baselearner an character specifying the component-wise base-learner to be used: ssp means smoothing splines, bsp B-splines (see bs and ols means linear models. Please note that only the characteristics of component-wise smoothing splines have been investigated theoretically and practically until now.
dfbase an integer vector giving the degrees of freedom for the smoothing spline, either globally for all variables (when its length is one) or separately for each single covariate.
family an object of class boost_family-class, implementing the negative gradient corresponding to the loss function to be optimized, by default, squared error loss for continuous responses is used.
control an object of class boost_control.
... additional arguments passed to callies.

Details

A (generalized) additive model is fitted using a boosting algorithm based on component-wise univariate smoothing splines. The methodology is described in Buhlmann and Yu (2003). If dfbase = 1, a univariate linear model is used as base learner (resulting in a linear partial fit for this variable).

The function gamboost_fit provides access to the fitting procedure without data pre-processing, e.g. for cross-validation.

Value

An object of class gamboost with print, AIC and predict methods being available.

References

Peter Buhlmann and Bin Yu (2003), Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98, 324–339.

Peter Buhlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, accepted. ftp://ftp.stat.math.ethz.ch/Research-Reports/Other-Manuscripts/buhlmann/BuehlmannHothorn_Boosting-rev.pdf

Examples


    ### a simple two-dimensional example: cars data
    cars.gb <- gamboost(dist ~ speed, data = cars, dfbase = 4, 
                        control = boost_control(mstop = 50))
    cars.gb
    AIC(cars.gb, method = "corrected")

    ### plot fit for mstop = 1, ..., 50
    plot(dist ~ speed, data = cars)    
    tmp <- sapply(1:mstop(AIC(cars.gb)), function(i)
        lines(cars$speed, predict(cars.gb[i]), col = "red"))          
    lines(cars$speed, predict(smooth.spline(cars$speed, cars$dist),
                              cars$speed)$y, col = "green")

    ### artificial example: sinus transformation
    x <- sort(runif(100)) * 10
    y <- sin(x) + rnorm(length(x), sd = 0.25)
    plot(x, y)
    ### linear model
    lines(x, fitted(lm(y ~ sin(x) - 1)), col = "red")
    ### GAM
    lines(x, fitted(gamboost(y ~ x - 1, 
                    control = boost_control(mstop = 500))), 
          col = "green")


[Package mboost version 0.5-8 Index]