find.interaction {randomSurvivalForest} | R Documentation |
Test for pairwise interactions between variables by comparing pairwise importance values to additive individual importance values.
find.interaction(object, predictorNames = NULL, sorted = TRUE, npred = NULL, subset = NULL, nrep = 1, rough = FALSE, importance = c("randomsplit", "permute")[1], seed = NULL, do.trace = FALSE, ...)
object |
An object of class (rsf, grow) or (rsf,
forest) . Note: forest =TRUE must be used in the
original rsf call. |
predictorNames |
Character vector of variable names to be considered. Default is to use all variables. |
sorted |
Should variables be sorted by importance values? Only
applies when predictorNames =NULL. |
npred |
Use the first npred variables as ordered by VIMP (only
applies when predictorNames =NULL). Default uses all variables. |
subset |
Indices indicating which rows of the predictor matrix
to be used (note: this applies to the object predictor
matrix, predictors ). Default is to use all rows. |
nrep |
Number of Monte Carlo replicates. |
rough |
Logical value indicating whether fast approximation should be used. Default is FALSE. |
importance |
Method used to compute variable importance (VIMP). |
seed |
Seed for random number generator. Must be a negative integer (the R wrapper handles incorrectly set seed values). |
do.trace |
Logical. Should trace output be enabled? Default is
FALSE. Integer values can also be passed. A positive value
causes output to be printed each do.trace iteration. |
... |
Further arguments passed to or from other methods. |
Using a previously grown forest, identify pairwise interactions for all pairs of variables from a specified list. Two variables are paired and their paired VIMP calculated (refered to as 'Paired' importance). The VIMP for each separate variable is also calculated. The sum of these two values is refered to as 'Additive' importance. A large positive or negative difference between 'Paired' and 'Additive' indicates an association worth pursuing if the VIMP's for each variable are reasonably large (Ishwaran, 2007).
Depending on the size of the data, computations might be slow. In
such cases, users should consider setting npred
to a smaller
number, or restricting the analysis to a subset of the data.
If nrep
is greater than 1, the analysis is repeated
nrep
times and results averaged over the replications.
Note that find.interaction
calls the lower level function
interaction.rsf
. For programming purposes, users may
consider doing likewise.
Invisibly, the interaction table.
Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu
H. Ishwaran (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.
interaction.rsf
.
#------------------------------------------------------------------------ # Explore relationship between the top two predictors from veteran data. data(veteran, package = "randomSurvivalForest") v.out <- rsf(Survrsf(time,status)~., veteran, ntree = 1000, forest = TRUE) find.interaction(v.out, npred = 2, nrep=1) ## Not run: #------------------------------------------------------------------------ # All pairwise interactions: PBC data. # Use fast approximation to speed up computations. data(pbc, package = "randomSurvivalForest") rsf.out <- rsf(Survrsf(days,status)~., pbc, ntree = 1000, forest = TRUE) find.interaction(rsf.out, nrep=3, rough=T) ## End(Not run)