max.subtree {randomSurvivalForest} | R Documentation |
Extract maximal subtree information from a forest. Used for variable selection and identifying interactions between variables.
max.subtree(object, max.order = 2, sub.order = FALSE, ...)
object |
An object of class (rsf, grow) or (rsf,
forest) . Requires forest =TRUE in the
original rsf call. |
max.order |
Non-negative integer specifying the target number of order depths. Default is to return the first and second order depths. Used to identify predictive variables. See details below. |
sub.order |
Set this value to TRUE to return the minimal depth of each variable relative to another variable. Used to identify interrelationship between variables. See details below. |
... |
Further arguments passed to or from other methods. |
The maximal subtree for a variable x
is the largest subtree
whose root node splits on x
. Thus, all parent nodes of
x
's maximal subtree have nodes that split on variables other
than x
. The largest maximal subtree possible is the root
node. In general, however, there can be more than one maximal
subtree for a variable. A maximal subtree may also not exist if
there are no splits on the variable. More details can be found in
Ishwaran et al. (2009).
The minimal depth of a maximal subtree measures predictiveness of a
variable x
. It equals the shortest distance (the depth) from
the root node to the parent node of the maximal subtree (zero is the
smallest value possible). The smaller the minimal depth, the more
impact x
has on prediction. The second order depth is the
shortest distance from the root node to the second node split using
x
. To specify the target order depth, use the
max.order
option (e.g., setting max.order
=2 returns
the first and second order depths).
Set sub.order=TRUE to obtain the minimal depth of a variable
relative to another variable. This returns a p
xp
matrix, where p
is the number of variables, and entries
[i][j] are the normalized relative minimal depth of a variable [j]
within the maximal subtree for variable [i], where normalization
adjusts for the size of [i]'s maximal subtree. Entry [i][i] is the
normalized minimal depth of i relative to the root node. The matrix
should be read by looking across rows (not down columns) and
identifies interrelationship between variables. Small [i][j]
entries indicate interactions. See find.interaction
for
further details.
For competing risk data, analysis corresponds to unconditional values (i.e., they are non-event specific).
A list with the following components:
mean |
Minimal depth averaged over a tree and forest for
each variable. Vector of size p . |
order |
Order depths for a given variable up to max.order
averaged over a tree and the forest. Matrix of dimension
p xmax.order . If max.order =0, a matrix of
p xntree is returned containing the minimum maximal
subtree distance for each variable by tree. |
count |
Averaged number of maximal subtrees, normalized by
the size of a tree, for each variable. Vector of size p . |
terminal |
Average terminal depth of each tree. Vector of
size ntree . |
nodesAtDepth |
Number of nodes per depth per tree.
Matrix of dimension maxDepth xntree . |
subOrder |
Average minimal depth of a variable relative to another
variable. Matrix of dimension p xp . Can be NULL. |
threshold |
Threshold used to select variables. Variables whose minimal depth exceeds this value are considered to be weak variables. See Ishwaran et al. (2009). |
Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur kogalurshear@gmail.com
H. Ishwaran, U.B. Kogalur, E.Z. Gorodeski, A.J. Minn and M.S. Lauer (2009). High-dimensional variable selection for survival data. Manuscript.
find.interaction
,
varSel
.
## Not run: #------------------------------------------------------------------------ # First and second order depths for all variables data(veteran, package = "randomSurvivalForest") veteran.out <- rsf(Survrsf(time, status) ~ . , data = veteran, forest = TRUE) v <- max.subtree(veteran.out) # first and second order depths print(round(v$order, 3)) # weak variables have minimal depth greater than the following threshold print(v$threshold) ## End(Not run)