GAMens.cv {GAMens} | R Documentation |
In v-fold cross validation, the data are divided into v
subsets of approximately equal size. Subsequently,
one of the v
data parts is excluded while the remainder of the data is used to create a GAMens
object.
Predictions are generated for the excluded data part. The process is repeated v
times.
GAMens.cv(formula, data, cv, rsm_size=2, autoform=FALSE, iter=10, df=4, bagging=TRUE, rsm=TRUE, fusion="avgagg")
formula |
a formula, as in the gam function. Smoothing splines are supported
as nonparametric smoothing terms, and should be indicated by s . See the documentation of s in the
gam package for its arguments. The GAMens function also provides the possibility for automatic
formula specification. See 'details' for more information. |
data |
a data frame in which to interpret the variables named in formula . |
cv |
An integer specifying the number of folds in the cross-validation. |
rsm_size |
an integer, the number of variables to use for random feature subsets used in the Random Subspace Method. Default is 2.
If rsm=FALSE , the value of rsm_size is ignored. |
autoform |
if FALSE (by default), the model specification in formula is used. If TRUE ,
the function triggers automatic formula specification. See 'details' for more information. |
iter |
an integer, the number of base (member) classifiers (GAMs) in the ensemble. Defaults to iter=10
base classifiers. |
df |
an integer, the number of degrees of freedom (df) used for smoothing spline estimation. Its value
is only used when autoform = TRUE . Defaults to df=4 . Its value is ignored if a formula is
specified and autoform is FALSE . |
bagging |
enables Bagging if value is TRUE (default). If FALSE ,
Bagging is disabled. Either bagging , rsm or both should be TRUE |
rsm |
enables Random Subspace Method (RSM) if value is TRUE (default). If FALSE ,
rsm is disabled. Either bagging , rsm or both should be TRUE |
fusion |
specifies the fusion rule for the aggregation of member classifier outputs in the ensemble. Possible values are
'avgagg' for average aggregation (default), 'majvote' for majority voting, 'w.avgagg' for
weighted average aggregation based on base classifier error rates, or 'w.majvote' for weighted majority
voting. |
An object of class GAMens.cv
, which is a list with the following components:
foldpred |
a data frame with, per fold, predicted class membership probabilities for the left-out observations. |
pred |
a data frame with predicted class membership probabilities. |
foldclass |
a data frame with, per fold, predicted classes for the left-out observations. |
class |
a data frame with predicted classes. |
conf |
the confusion matrix which compares the real versus predicted class memberships, based on the class object. |
Koen W. De Bock Koen.DeBock@UGent.be, Kristof Coussement K.Coussement@Ieseg.fr and Dirk Van den Poel Dirk.VandenPoel@UGent.be
De Bock, K. W., Coussement, K. and Van den Poel, D. (2010): "Ensemble Classification based on generalized additive models". Computational Statistics & Data Analysis, doi:10.1016/j.csda.2009.12.013.
Breiman, L. (1996): "Bagging predictors". Machine Learning, Vol 24, 2, pp. 123–140.
Hastie, T. and Tibshirani, R. (1990): "Generalized Additive Models", Chapman and Hall, London.
Ho, T. K. (1998): "The random subspace method for constructing decision forests". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 20, 8, pp. 832–844.
## Load data: mlbench library should be loaded!) library(mlbench) data(Sonar) ## Perform 10-fold cross-validation of GAMrsm ensemble on Sonar data ## using all variables Sonar.cv.GAMrsm <- GAMens.cv(Class~., Sonar ,10, 3 , autoform=TRUE, iter=10, bagging=FALSE,rsm=TRUE ) ## Compare classification performance of GAMens, GAMrsm and GAMbag ## ensembles, using all variables in the Sonar dataset, based on 10-fold ## cross validation runs Sonar.cv.GAMens <- GAMens.cv(Class~s(V1,4)+s(V2,3)+s(V3,4)+V4+V5+V6, Sonar ,5, 4 , autoform=FALSE, iter=10 ) Sonar.cv.GAMrsm <- GAMens.cv(Class~s(V1,4)+s(V2,3)+s(V3,4)+V4+V5+V6, Sonar ,5, 4 , autoform=FALSE, iter=10, bagging=FALSE, rsm=TRUE ) Sonar.cv.GAMbag <- GAMens.cv(Class~s(V1,4)+s(V2,3)+s(V3,4)+V4+V5+V6, Sonar ,5, 4 , autoform=FALSE, iter=10, bagging=TRUE, rsm=FALSE ) ## Calculate AUCs (for function colAUC, load caTools library) library(caTools) GAMens.cv.auc <- colAUC(Sonar.cv.GAMens[[2]], Sonar["Class"]=="R", plotROC=FALSE) GAMrsm.cv.auc <- colAUC(Sonar.cv.GAMrsm[[2]], Sonar["Class"]=="R", plotROC=FALSE) GAMbag.cv.auc <- colAUC(Sonar.cv.GAMbag[[2]], Sonar["Class"]=="R", plotROC=FALSE)