trim.oblique.tree {oblique.tree} | R Documentation |
Determines a sequence of concise subtrees of the supplied tree by recursively “trimming” off the least important attributes used in oblique splits.
trim.oblique.tree( tree, best = NULL, newdata, trim.impurity = c("deviance", "misclass"), trim.depth = c("partial", "complete"), eps = 1e-3)
tree |
Fitted model object of class oblique.tree . This is assumed to be the result of some function that produces an object with the same named components as that returned by oblique.tree . |
best |
Requests the complexity (i.e. 1 + number of attributes used throughout the tree) of the concise subtree of tree to return (best a scalar) or a (optional) sequence of concise subtrees (best a vector). If missing, best is determined algorithmically. If there is no tree in the sequence of the requested size, the next largest is returned. |
newdata |
Data frame upon which the sequence of cost-complexity subtrees is evaluated. If missing, the data used to grow the tree is used. |
trim.impurity |
Character string denoting the measure of node heterogeneity used to guide tree trimming. The default is deviance and the alternative is misclass (number of misclassifications or total loss). |
trim.depth |
A character string denoting if oblique splits should be trimmed towards axis-parallel splits partial or to the constant predictor complete . |
eps |
A lower bound for the probabilities, used to compute deviances if events of predicted probability zero occur in newdata . |
Determines a sequence of concise subtrees of the supplied tree by recursively "trimming" its splits, based upon the cost-complexity measure.
If best
is supplied, the optimal subtree for that value is returned.
The response as well as the predictors referred to in the right side of the formula in tree
must be present by name in newdata
. These data are dropped down each tree in the trim sequence and deviances or losses calculated by comparing the supplied response to the prediction. A plot
method exists for objects of this class. It displays the value of the deviance, the number of misclassifications or the total loss for each subtree in the trim sequence. An additional axis displays the values of the cost-complexity parameter at each subtree.
If best
is a scalar, a c("oblique.tree","tree")
object of size best
is returned. Otherwise an object of class c("trim", "trim.sequence")
is returned. The object contains the following components:
comp |
The complexity of each tree in the cost-complexity pruning sequence. |
dev |
Total deviance of each tree in the cost-complexity pruning sequence. |
h |
The value of the cost-complexity pruning parameter of each tree in the sequence. |
A. Truong
Truong. A (2009) Fast Growing and Interpretable Oblique Trees via Probabilistic Models
#grow a tree on the Pima Indian dataset data(Pima.tr, package = "MASS") ob.tree <- oblique.tree(formula = type~., data = Pima.tr, oblique.splits = "only") plot(ob.tree);text(ob.tree);title(main="Full Oblique Tree") #partially trimming #examine the tree sequence trim.seq <- trim.oblique.tree( tree = ob.tree) print(trim.seq);plot(trim.seq) #examine test error over the trim sequence data(Pima.te, package = "MASS") trim.seq <- trim.oblique.tree( tree = ob.tree, newdata = Pima.te) print(trim.seq);plot(trim.seq) #deviance is least when best = 7 p.trimmed <- trim.oblique.tree( tree = ob.tree, best = 7) plot(p.trimmed);text(p.trimmed);title(main="Partially Trimmed Tree") #complete trimming #examine the tree sequence trim.seq <- trim.oblique.tree( tree = ob.tree, trim.depth = "complete") print(trim.seq);plot(trim.seq) #examine test error over the trim sequence data(Pima.te, package = "MASS") trim.seq <- trim.oblique.tree( tree = ob.tree, trim.depth = "complete", newdata = Pima.te) print(trim.seq);plot(trim.seq) #deviance is least when best = 9 c.trimmed <- trim.oblique.tree( tree = ob.tree, best = 9) plot(c.trimmed);text(c.trimmed);title(main="Completely Trimmed Tree")