cvrisk {mboost}R Documentation

Cross-Validation

Description

Cross-validated estimation of the empirical risk for hyper-parameter selection.

Usage

cvrisk(object, folds, grid = c(1:mstop(object)))

Arguments

object an object of class gb.
folds a weight matrix with number of rows equal to the number of observations. The number of columns corresponds to the number of cross-validation runs.
grid a vector of stopping parameters the empirical risk is to be evaluated for.

Details

The number of boosting iterations is a hyper-parameter of the boosting algorithms implemented in this package. Honest, i.e., cross-validated, estimates of the empirical risk for different stopping parameters mstop are computed by this function which can be utilized to choose an appropriate number of boosting iterations to be applied.

Different forms of cross-validation can be applied, for example 10-fold cross-validation or bootstrapping. The weights (zero weights correspond to test cases) are defined via the folds matrix.

Value

An object of class cvrisk, basically a matrix containing estimates of the empirical risk for a varying number of bootstrap iterations. plot and print methods are available as well as a mstop method.

Note

The model object needs to be fitted with option savedata = TRUE in boost_control.

References

Torsten Hothorn, Friedrich Leisch, Achim Zeileis and Kurt Hornik (2006), The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics, 14(3), 675–699.

Examples


  data("bodyfat", package = "mboost")

  ### fit linear model to data
  model <- glmboost(DEXfat ~ ., data = bodyfat,
                    control = boost_control(center = TRUE))

  ### AIC-based selection of number of boosting iterations
  maic <- AIC(model)
  maic

  ### inspect coefficient path and AIC-based stopping criterion
  par(mai = par("mai") * c(1, 1, 1, 1.8))
  plot(model)
  abline(v = mstop(maic), col = "lightgray")

  ### 10-fold cross-validation
  n <- nrow(bodyfat)
  k <- 10
  ntest <- floor(n / k)
  cv10f <- matrix(c(rep(c(rep(0, ntest), rep(1, n)), k - 1),
                    rep(0, n * k - (k - 1) * (n + ntest))), nrow = n)
  cvm <- cvrisk(model, folds = cv10f)
  print(cvm)
  mstop(cvm)
  plot(cvm)

  ### 25 bootstrap iterations
  set.seed(290875)
  bs25 <- rmultinom(25, n, rep(1, n)/n)
  cvm <- cvrisk(model, folds = bs25)
  print(cvm)
  mstop(cvm)

  layout(matrix(1:2, ncol = 2))
  plot(cvm)

  ### trees
  blackbox <- blackboost(DEXfat ~ ., data = bodyfat)
  cvtree <- cvrisk(blackbox, folds = bs25)
  plot(cvtree)


[Package mboost version 1.1-0 Index]