TWIX {TWIX}R Documentation

Trees with extra splits

Description

Trees with extra splits

Usage

TWIX(formula, data = NULL, test.data = 0, subset = NULL,
        method = "deviance", topn.method = "complete", cluster = NULL,
        minsplit = 30, minbucket = round(minsplit/3), Devmin = 0.05,
        splitf = "entropy", topN = 1, level = 30, st = 1, cl.level = 2,
        tol = 0.15, score = 1,k = 0, trace.plot=FALSE,robust = FALSE, ...)

Arguments

formula formula of the form y ~ x1 + x2 + ..., where y must be a factor and x1,x2,... are numeric or factor.
data an optional data frame containing the variables in the model (training data).
test.data This can be a data frame containing new data, 0 (default), or "NULL". If set to "NULL" the bad obserations will be specified.
subset an optional vector specifying a subset of observations to be used.
method Which split points will be used? This can be "deviance" (default), "grid" or "local". If the method is set to:
"local" - the program uses the local maxima of the split function (entropy),
"deviance" - all values of the entropy,
"grid" - grid points.
topn.method one of "complete"(default) or "single". A specification of the consideration of the split points. If set to "complete" it uses split points from all variables, else it uses split points per variable.
cluster name of the cluster, if parallel computing will be used.
minsplit the minimum number of observations that must exist in a node.
minbucket the minimum number of observations in any terminal <leaf> node.
Devmin the minimum improvement on entropy by splitting.
topN integer vector. How many splits will be selected and at which level? If length 1, the same size of splits will be selected at each level. If length > 1, for example topN=c(3,2), 3 splits will be chosen at first level, 2 splits at second level and for all next levels 1 split.
level maximum depth of the trees. If level set to 1, trees consist of root node.
st step parameter for method "grid".
cl.level parameter for parallel computing.
tol parameter, which will be used, if topn.method is set to "single".
score a parameter, which can be 1(default) or 2. If it is 2 the sort-function will be used,
if it set to 1 weigth-function will be used
score = 0.25*scale(dev.tr)+0.6*scale(fit.tr)+0.15*(tree.structure)
k k-fold cross-validation of split-function. k specify the part of observations which will be take in hold-out sample (k can be (0,0.5)).
trace.plot Should trace plot be ploted?
splitf kind of the splitting function to be used. It can be one of "entropy"(default) or "p-adj". If set to "p-adj", the p-value adjusted classification tree will be build.
robust If set to TRUE, robust entropy estimation will be used.
... further arguments to be passed to or from methods.

Value

a list with the following components :

call the call generating the object.
trees a list of all constructed trees, which include ID, Dev ... for each tree.
greedy.tree greedy tree
multitree database
agg.id vector specifying trees for aggregation.
Bad.id ID-vector of bad observations from train data.

References

Martin Theus, Sergej Potapov and Simon Urbanek (2006). TWIX (Talk given at the 3rd Ensemble Workshop in Munich 2006) http://theusrus.de/Talks/Talks/TWIX.pdf

Lausen, B., Hothorn, T., Bretz, F. and Schmacher, M. (2004). Assessment of Optimally Selected Prognostic Factors. Biometrical Journal 46, 364-374.

See Also

get.tree, predict.TWIX, print.single.tree, plot.TWIX, deviance.TWIX

Examples


data(olives)
set.seed(123)
i <- sample(572,150)
ic <- setdiff(1:572,i)
training <- olives[ic,]
test <- olives[i,]

TM1<-TWIX(Region~.,data=training[,1:9],topN=c(4,3),method="local")
TM1$trees
pred <- predict(TM1,newdata=test,sq=1)
predict(TM1,newdata=test,sq=1:2,ccr=TRUE)$CCR

##
## the p-value adjusted classification tree

data(iris)
TM2 <- TWIX(Species~.,data=iris,splitf="p-adj")
predict(TM2,newdata=iris,sq=1,ccr=TRUE)$CCR


[Package TWIX version 0.2.4 Index]