TWIX {TWIX} | R Documentation |
Trees with extra splits
TWIX(formula, data = NULL, test.data = NULL, subset = NULL, method = "deviance", topn.method = "complete", cluster = NULL, minsplit = 30, minbucket = round(minsplit/3), Devmin = 0.05, splitf = "deviance", topN = 1, level = 30, st = 1, cl.level = 2, tol = 0.15, score = 1, k = 0, verbose=FALSE, trace.plot=FALSE, ...)
formula |
formula of the form y ~ x1 + x2 + ... ,
where y must be a factor and x1,x2,... are numeric or factor. |
data |
an optional data frame containing the variables in the model (training data). |
test.data |
This can be a data frame containing new data. |
subset |
an optional vector specifying a subset of observations to be used. |
method |
Which split points will be used? This can be "deviance"
(default), "grid" or "local" . If the method is set to:"local" - the program uses the local maxima of the split function (entropy),"deviance" - all values of the entropy,"grid" - grid points. |
topn.method |
one of "complete" (default) or "single" .
A specification of the consideration of the split points.
If set to "complete" it uses split points from all variables,
else it uses split points per variable. |
cluster |
the name of the cluster, if parallel computing will be used. |
minsplit |
the minimum number of observations that must exist in a node. |
minbucket |
the minimum number of observations in any terminal <leaf> node. |
Devmin |
the minimum improvement on entropy by splitting.
If "splitf" set to "p-adj" , "Devmin" will be
the significance level alpha. |
splitf |
kind of the splitting function to be used. It can be one of "deviance" (default) or "p-adj" .
If set to "p-adj" , the p-value adjusted classification tree will be performed. |
topN |
integer vector. How many splits will be selected and at which
level? If length 1, the same size of splits will be selected at each level.
If length > 1 , for example topN=c(3,2) , 3 splits will be chosen
at first level, 2 splits at second level and for all next levels 1 split. |
level |
the maximum depth of the trees. If level set to 1, trees
consist of root node. |
st |
step parameter for method "grid" . |
cl.level |
an internal parameter of parallel computing. |
tol |
parameter, which will be used, if topn.method is set to
"single" . |
score |
Specifies the method for model selection. This can be 1 (default), 2 or 3 .If it is 1 the weighted correct classification rate will be used,if it is 2 the sort-function will be used,if it set to 3 the weigth-function will be usedscore = 0.25*scale(dev.tr)+0.6*scale(fit.tr)+0.15*(structure)
|
k |
k-fold cross-validation of split-function. k specify the part of observations which will be take in hold-out sample (k can be (0,0.5)). |
verbose |
A logical for printing a training log. |
trace.plot |
Should trace plot be ploted? |
... |
further arguments to be passed to or from methods. |
This implementation can't handle missing values. Therefore, cases with missing values must be removed. For p-value adjusted classification trees, continuous and binary independent descriptors are implemented as predictors and a response variable must be categorical with two categories.
a list with the following components :
call |
the call generating the object. |
trees |
a list of all constructed trees, which include ID, Dev ... for each tree. |
greedy.tree |
greedy tree |
multitree |
database |
agg.id |
vector specifying trees for aggregation. |
Bad.id |
ID-vector of bad observations from train data. |
Martin Theus, Sergej Potapov and Simon Urbanek (2006).
TWIX (Talk given at the 3rd Ensemble Workshop in Munich 2006).
http://theusrus.de/Talks/Talks/TWIX.pdf
Lausen, B., Hothorn, T., Bretz, F. and Schmacher, M. (2004).
Assessment of Optimally Selected Prognostic Factors.
Biometrical Journal 46, 364-374.
get.tree
, predict.TWIX
,
print.single.tree
, plot.TWIX
,
bootTWIX
data(olives) ### train and test data set.seed(123) i <- sample(572,150) ic <- setdiff(1:572,i) training <- olives[ic,] test <- olives[i,] TM <- TWIX(Region~.,data=training[,1:9],topN=c(4,3),method="local") TM$trees get.tree(TM,1) pred <- predict(TM,newdata=test,sq=1) predict(TM,newdata=test,sq=1:2,ccr=TRUE)$CCR ### ### the p-value adjusted classification tree library(mlbench) data(PimaIndiansDiabetes2) Pima <- na.omit(PimaIndiansDiabetes2) ### train and test data set.seed(1111) N <- nrow(Pima) icv <- sample(N,N/3) itr <- setdiff(1:N,icv) train <- Pima[itr,] test <- Pima[icv,] TMa <- TWIX(diabetes~.,data=train,splitf="p-adj",Devmin=0.05) get.tree(TMa) predict(TMa,newdata=test,sq=1,ccr=TRUE)$CCR