TWIX {TWIX} | R Documentation |
Trees with extra splits
TWIX(formula, data = NULL, test.data = 0, subset = NULL, method = "deviance", topn.method = "complete", cluster = NULL, minsplit = 30, minbucket = round(minsplit/3), Devmin = 0.05, splitf = "entropy", topN = 1, level = 30, st = 1, cl.level = 2, tol = 0.15, score = 1,k = 0, trace.plot=FALSE,robust = FALSE, ...)
formula |
formula of the form y ~ x1 + x2 + ... ,
where y must be a factor and x1,x2,... are numeric or factor. |
data |
an optional data frame containing the variables in the model (training data). |
test.data |
This can be a data frame containing new data, 0 (default),
or "NULL" . If set to "NULL" the bad obserations will be specified. |
subset |
an optional vector specifying a subset of observations to be used. |
method |
Which split points will be used? This can be "deviance"
(default), "grid" or "local" . If the method is set to:"local" - the program uses the local maxima of the split function (entropy),"deviance" - all values of the entropy,"grid" - grid points. |
topn.method |
one of "complete" (default) or "single" .
A specification of the consideration of the split points.
If set to "complete" it uses split points from all variables,
else it uses split points per variable. |
cluster |
name of the cluster, if parallel computing will be used. |
minsplit |
the minimum number of observations that must exist in a node. |
minbucket |
the minimum number of observations in any terminal <leaf> node. |
Devmin |
the minimum improvement on entropy by splitting. |
topN |
integer vector. How many splits will be selected and at which
level? If length 1, the same size of splits will be selected at each level.
If length > 1, for example topN=c(3,2) , 3 splits will be chosen
at first level, 2 splits at second level and for all next levels 1 split. |
level |
maximum depth of the trees. If level set to 1, trees
consist of root node. |
st |
step parameter for method "grid" . |
cl.level |
parameter for parallel computing. |
tol |
parameter, which will be used, if topn.method is set to
"single" . |
score |
a parameter, which can be 1 (default) or 2 .
If it is 2 the sort-function will be used,if it set to 1 weigth-function will be usedscore = 0.25*scale(dev.tr)+0.6*scale(fit.tr)+0.15*(tree.structure) |
k |
k-fold cross-validation of split-function. k specify the part of observations which will be take in hold-out sample (k can be (0,0.5)). |
trace.plot |
Should trace plot be ploted? |
splitf |
kind of the splitting function to be used. It can be one of "entropy" (default) or "p-adj" .
If set to "p-adj" , the p-value adjusted classification tree will be build. |
robust |
If set to TRUE , robust entropy estimation will be used. |
... |
further arguments to be passed to or from methods. |
a list with the following components :
call |
the call generating the object. |
trees |
a list of all constructed trees, which include ID, Dev ... for each tree. |
greedy.tree |
greedy tree |
multitree |
database |
agg.id |
vector specifying trees for aggregation. |
Bad.id |
ID-vector of bad observations from train data. |
Martin Theus, Sergej Potapov and Simon Urbanek (2006). TWIX (Talk given at the 3rd Ensemble Workshop in Munich 2006) http://theusrus.de/Talks/Talks/TWIX.pdf
Lausen, B., Hothorn, T., Bretz, F. and Schmacher, M. (2004). Assessment of Optimally Selected Prognostic Factors. Biometrical Journal 46, 364-374.
get.tree
, predict.TWIX
,
print.single.tree
, plot.TWIX
,
deviance.TWIX
data(olives) set.seed(123) i <- sample(572,150) ic <- setdiff(1:572,i) training <- olives[ic,] test <- olives[i,] TM1<-TWIX(Region~.,data=training[,1:9],topN=c(4,3),method="local") TM1$trees pred <- predict(TM1,newdata=test,sq=1) predict(TM1,newdata=test,sq=1:2,ccr=TRUE)$CCR ## ## the p-value adjusted classification tree data(iris) TM2 <- TWIX(Species~.,data=iris,splitf="p-adj") predict(TM2,newdata=iris,sq=1,ccr=TRUE)$CCR