bootTWIX {TWIX} | R Documentation |
Bootstrap samples of the Greedy-TWIX-trees.
bootTWIX(formula, data=NULL, nbagg=1, topN=1, subset=NULL, method="deviance", topn.method="complete", replace = TRUE, ns = 1, cluster=NULL, minsplit=2, minbucket=round(minsplit/3), splitf="deviance", Devmin=0.05, level=30, tol=0.01)
formula |
formula of the form y ~ x1 + x2 + ... ,
where y must be a factor and x1,x2,... are numeric. |
data |
an optional data frame containing the variables in the model (training data). |
nbagg |
an integer giving the number of bootstrap replications. |
topN |
integer vector. How many splits will be selected and at which
level? If length 1, the same size of splits will be selected at each level.
If length > 1, for example topN=c(3,2) , 3 splits will be chosen
at first level, 2 splits at second level and for all next levels 1 split. |
subset |
an optional vector specifying a subset of observations to be used. |
method |
Which split points will be used? This can be "deviance"
(default), "grid" or "local" . If the method is set to:"local" - the program uses the local maxima of the split function(entropy),"deviance" - all values of the entropy,"grid" - grid points. |
topn.method |
one of "complete" (default) or "single" .
A specification of the consideration of the split points.
If set to "complete" it uses split points from all variables,
else it uses split points per variable. |
replace |
Should sampling be with replacement? |
ns |
data set of size ns <= nrow(data) obtained by sampling without replacement. |
cluster |
name of the cluster, if parallel computing will be used. |
minsplit |
the minimum number of observations that must exist in a node. |
minbucket |
the minimum number of observations in any terminal <leaf> node. |
splitf |
kind of the splitting function to be used.
It can be one of "deviance" (default) or "p-adj" .
If splitf set to "p-adj" , the p-value adjusted classification tree will be performed. |
Devmin |
the minimum improvement on entropy by splitting or by the p-value adjusted classification trees the significance level alpha. |
level |
maximum depth of the trees. If level set to 1, trees consist of root node. |
tol |
parameter, which will be used, if topn.method is set to "single" . |
a list with the following components :
call |
the call generating the object. |
trees |
a list of all constructed trees, which include ID , Dev , ... for each tree. |
TWIX
, get.tree
,
predict.bootTWIX
, deviance.TWIX
,
bagg.TWIX
library(ElemStatLearn) data(SAheart) ### response variable must be a factor SAheart$chd <- factor(SAheart$chd) ### test and train data ### set.seed(1234) icv <- sample(nrow(SAheart),nrow(SAheart)*0.3) itr <- setdiff(1:nrow(SAheart),icv) train <- SAheart[itr,] test <- SAheart[icv,] ### Bagging with greedy decision trees as base classifier M1 <- bootTWIX(chd~.,data=train,nbagg=100) ### Bagging with the p-value adjusted classification trees as base classifier M2 <- bootTWIX(chd~.,data=train,nbagg=100,splitf="p-adj",Devmin=0.01) pred1 <- predict(M1,test,sq=1:length(M1$trees)) pred2 <- predict(M2,test,sq=1:length(M2$trees)) ### ### CCR's sum(pred1 == test$chd)/nrow(test) sum(pred2 == test$chd)/nrow(test)