bootTWIX {TWIX}R Documentation

Bootstrap of the TWIX trees

Description

Bootstrap samples of the Greedy-TWIX-trees.

Usage

bootTWIX(formula, data=NULL, nbagg=1, topN=1, subset=NULL,
                method="deviance", topn.method="complete",
                replace = TRUE, ns = 1, cluster=NULL, minsplit=2,
                minbucket=round(minsplit/3), splitf="deviance", 
                Devmin=0.05, level=30, tol=0.01)

Arguments

formula formula of the form y ~ x1 + x2 + ..., where y must be a factor and x1,x2,... are numeric.
data an optional data frame containing the variables in the model (training data).
nbagg an integer giving the number of bootstrap replications.
topN integer vector. How many splits will be selected and at which level? If length 1, the same size of splits will be selected at each level. If length > 1, for example topN=c(3,2), 3 splits will be chosen at first level, 2 splits at second level and for all next levels 1 split.
subset an optional vector specifying a subset of observations to be used.
method Which split points will be used? This can be "deviance" (default), "grid" or "local". If the method is set to:
"local" - the program uses the local maxima of the split function(entropy),
"deviance" - all values of the entropy,
"grid" - grid points.
topn.method one of "complete"(default) or "single". A specification of the consideration of the split points. If set to "complete" it uses split points from all variables, else it uses split points per variable.
replace Should sampling be with replacement?
ns data set of size ns <= nrow(data) obtained by sampling without replacement.
cluster name of the cluster, if parallel computing will be used.
minsplit the minimum number of observations that must exist in a node.
minbucket the minimum number of observations in any terminal <leaf> node.
splitf kind of the splitting function to be used. It can be one of "deviance"(default) or "p-adj". If splitf set to "p-adj", the p-value adjusted classification tree will be performed.
Devmin the minimum improvement on entropy by splitting or by the p-value adjusted classification trees the significance level alpha.
level maximum depth of the trees. If level set to 1, trees consist of root node.
tol parameter, which will be used, if topn.method is set to "single".

Value

a list with the following components :

call the call generating the object.
trees a list of all constructed trees, which include ID, Dev , ... for each tree.

See Also

TWIX, get.tree, predict.bootTWIX, deviance.TWIX, bagg.TWIX

Examples

    library(ElemStatLearn)
    data(SAheart)

    ### response variable must be a factor
    SAheart$chd <- factor(SAheart$chd) 

    ### test and train data
    ###
    set.seed(1234)
    icv <- sample(nrow(SAheart),nrow(SAheart)*0.3)
    itr <- setdiff(1:nrow(SAheart),icv)
    train <- SAheart[itr,]
    test <- SAheart[icv,]

    ### Bagging with greedy decision trees as base classifier
    M1 <- bootTWIX(chd~.,data=train,nbagg=100)

    ### Bagging with the p-value adjusted classification trees as base classifier
    M2 <- bootTWIX(chd~.,data=train,nbagg=100,splitf="p-adj",Devmin=0.01)

    pred1 <- predict(M1,test,sq=1:length(M1$trees))
    pred2 <- predict(M2,test,sq=1:length(M2$trees))

    ###
    ### CCR's
  
    sum(pred1 == test$chd)/nrow(test)
    sum(pred2 == test$chd)/nrow(test)

[Package TWIX version 0.2.6 Index]