TWIX {TWIX}R Documentation

Trees with extra splits

Description

Trees with extra splits

Usage

TWIX(formula, data = NULL, test.data = NULL, subset = NULL,
        method = "deviance", topn.method = "complete",
        cluster = NULL, minsplit = 30, minbucket = round(minsplit/3),
        Devmin = 0.05, splitf = "deviance", topN = 1, level = 30,
        st = 1, cl.level = 2, tol = 0.15, score = 1, k = 0, 
        verbose=FALSE, trace.plot=FALSE, ...)

Arguments

formula formula of the form y ~ x1 + x2 + ..., where y must be a factor and x1,x2,... are numeric or factor.
data an optional data frame containing the variables in the model (training data).
test.data This can be a data frame containing new data.
subset an optional vector specifying a subset of observations to be used.
method Which split points will be used? This can be "deviance" (default), "grid" or "local". If the method is set to:
"local" - the program uses the local maxima of the split function (entropy),
"deviance" - all values of the entropy,
"grid" - grid points.
topn.method one of "complete"(default) or "single". A specification of the consideration of the split points. If set to "complete" it uses split points from all variables, else it uses split points per variable.
cluster the name of the cluster, if parallel computing will be used.
minsplit the minimum number of observations that must exist in a node.
minbucket the minimum number of observations in any terminal <leaf> node.
Devmin the minimum improvement on entropy by splitting. If "splitf" set to "p-adj", "Devmin" will be the significance level alpha.
splitf kind of the splitting function to be used. It can be one of "deviance"(default) or "p-adj". If set to "p-adj", the p-value adjusted classification tree will be performed.
topN integer vector. How many splits will be selected and at which level? If length 1, the same size of splits will be selected at each level. If length > 1, for example topN=c(3,2), 3 splits will be chosen at first level, 2 splits at second level and for all next levels 1 split.
level the maximum depth of the trees. If level set to 1, trees consist of root node.
st step parameter for method "grid".
cl.level an internal parameter of parallel computing.
tol parameter, which will be used, if topn.method is set to "single".
score Specifies the method for model selection. This can be 1(default), 2 or 3.
If it is 1 the weighted correct classification rate will be used,
if it is 2 the sort-function will be used,
if it set to 3 the weigth-function will be used
score = 0.25*scale(dev.tr)+0.6*scale(fit.tr)+0.15*(structure)
k k-fold cross-validation of split-function. k specify the part of observations which will be take in hold-out sample (k can be (0,0.5)).
verbose A logical for printing a training log.
trace.plot Should trace plot be ploted?
... further arguments to be passed to or from methods.

Details

This implementation can't handle missing values. Therefore, cases with missing values must be removed. For p-value adjusted classification trees, continuous and binary independent descriptors are implemented as predictors and a response variable must be categorical with two categories.

Value

a list with the following components :

call the call generating the object.
trees a list of all constructed trees, which include ID, Dev ... for each tree.
greedy.tree greedy tree
multitree database
agg.id vector specifying trees for aggregation.
Bad.id ID-vector of bad observations from train data.

References

Martin Theus, Sergej Potapov and Simon Urbanek (2006).
TWIX (Talk given at the 3rd Ensemble Workshop in Munich 2006).
http://theusrus.de/Talks/Talks/TWIX.pdf

Lausen, B., Hothorn, T., Bretz, F. and Schmacher, M. (2004).
Assessment of Optimally Selected Prognostic Factors.
Biometrical Journal 46, 364-374.

See Also

get.tree, predict.TWIX, print.single.tree, plot.TWIX, bootTWIX

Examples

    data(olives)

    ### train and test data
    set.seed(123)
    i <- sample(572,150)
    ic <- setdiff(1:572,i)
    training <- olives[ic,]
    test <- olives[i,]

    TM <- TWIX(Region~.,data=training[,1:9],topN=c(4,3),method="local")
    TM$trees
    get.tree(TM,1)
    pred <- predict(TM,newdata=test,sq=1)
    predict(TM,newdata=test,sq=1:2,ccr=TRUE)$CCR

    ###
    ### the p-value adjusted classification tree

    library(mlbench)
    data(PimaIndiansDiabetes2)
    Pima <- na.omit(PimaIndiansDiabetes2)

    ### train and test data
    set.seed(1111)
    N <- nrow(Pima)
    icv <- sample(N,N/3)
    itr <- setdiff(1:N,icv)
    train <- Pima[itr,]
    test <- Pima[icv,]
  
    TMa <- TWIX(diabetes~.,data=train,splitf="p-adj",Devmin=0.05)
    get.tree(TMa)
    predict(TMa,newdata=test,sq=1,ccr=TRUE)$CCR

[Package TWIX version 0.2.6 Index]