cvrisk {mboost} | R Documentation |
Cross-validated estimation of the empirical risk for hyper-parameter selection.
cvrisk(object, folds, grid = c(1:mstop(object)), ...)
object |
an object of class gb . |
folds |
a weight matrix with number of rows equal to the number of observations. The number of columns corresponds to the number of cross-validation runs. |
grid |
a vector of stopping parameters the empirical risk is to be evaluated for. |
... |
additional arguments passed to mclapply
eventually. |
The number of boosting iterations is a hyper-parameter of the
boosting algorithms implemented in this package. Honest,
i.e., cross-validated, estimates of the empirical risk
for different stopping parameters mstop
are computed by
this function which can be utilized to choose an appropriate
number of boosting iterations to be applied.
Different forms of cross-validation can be applied, for example
10-fold cross-validation or bootstrapping. The weights (zero weights
correspond to test cases) are defined via the folds
matrix.
If package multicore
is available, cvrisk
runs in parallel on cores/processors available. The scheduling
can be changed by the corresponding arguments of
mclapply
(via the dot arguments).
No trace output is given when running in parallel.
An object of class cvrisk
, basically a matrix
containing estimates of the empirical risk for a varying number
of bootstrap iterations. plot
and print
methods
are available as well as a mstop
method.
The model object
needs to be fitted with option
savedata = TRUE
in boost_control
.
Torsten Hothorn, Friedrich Leisch, Achim Zeileis and Kurt Hornik (2006), The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics, 14(3), 675–699.
AIC.gamboost
or AIC.glmboost
for
AIC
based selection of the stopping iteration. Use mstop
to extract the optimal stopping iteratation from cvrisk
object.
data("bodyfat", package = "mboost") ### fit linear model to data model <- glmboost(DEXfat ~ ., data = bodyfat, control = boost_control(center = TRUE)) ### AIC-based selection of number of boosting iterations maic <- AIC(model) maic ### inspect coefficient path and AIC-based stopping criterion par(mai = par("mai") * c(1, 1, 1, 1.8)) plot(model) abline(v = mstop(maic), col = "lightgray") ### 10-fold cross-validation n <- nrow(bodyfat) k <- 10 ntest <- floor(n / k) cv10f <- matrix(c(rep(c(rep(0, ntest), rep(1, n)), k - 1), rep(0, n * k - (k - 1) * (n + ntest))), nrow = n) cvm <- cvrisk(model, folds = cv10f) print(cvm) mstop(cvm) plot(cvm) ### 25 bootstrap iterations set.seed(290875) bs25 <- rmultinom(25, n, rep(1, n)/n) cvm <- cvrisk(model, folds = bs25) print(cvm) mstop(cvm) layout(matrix(1:2, ncol = 2)) plot(cvm) ### trees blackbox <- blackboost(DEXfat ~ ., data = bodyfat) cvtree <- cvrisk(blackbox, folds = bs25) plot(cvtree)