find.interaction {randomSurvivalForest}R Documentation

Find Interactions Between Pairs of Variables

Description

Test for pairwise interactions between variables by comparing pairwise importance values to additive individual importance values.

Usage

    find.interaction(object,
                  predictorNames = NULL,
                  sorted = TRUE,
                  npred = NULL,
                  subset = NULL, 
                  nrep = 1,
                  rough = FALSE,
                  importance = c("randomsplit", "permute")[1],
                  seed = NULL,
                  do.trace = FALSE,
                  ...)

Arguments

object An object of class (rsf, grow) or (rsf, forest). Note: forest=TRUE must be used in the original rsf call.
predictorNames Character vector of variable names to be considered. Default is to use all variables.
sorted Should variables be sorted by importance values? Only applies when predictorNames=NULL.
npred Use the first npred variables as ordered by VIMP (only applies when predictorNames=NULL). Default uses all variables.
subset Indices indicating which rows of the predictor matrix to be used (note: this applies to the object predictor matrix, predictors). Default is to use all rows.
nrep Number of Monte Carlo replicates.
rough Logical value indicating whether fast approximation should be used. Default is FALSE.
importance Method used to compute variable importance (VIMP).
seed Seed for random number generator. Must be a negative integer (the R wrapper handles incorrectly set seed values).
do.trace Logical. Should trace output be enabled? Default is FALSE. Integer values can also be passed. A positive value causes output to be printed each do.trace iteration.
... Further arguments passed to or from other methods.

Details

Using a previously grown forest, identify pairwise interactions for all pairs of variables from a specified list. Two variables are paired and their paired VIMP calculated (refered to as 'Paired' importance). The VIMP for each separate variable is also calculated. The sum of these two values is refered to as 'Additive' importance. A large positive or negative difference between 'Paired' and 'Additive' indicates an association worth pursuing if the VIMP's for each variable are reasonably large (Ishwaran, 2007).

Depending on the size of the data, computations might be slow. In such cases, users should consider setting npred to a smaller number, or restricting the analysis to a subset of the data.

If nrep is greater than 1, the analysis is repeated nrep times and results averaged over the replications.

Note that find.interaction calls the lower level function interaction.rsf. For programming purposes, users may consider doing likewise.

Value

Invisibly, the interaction table.

Author(s)

Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu

References

H. Ishwaran (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.

See Also

interaction.rsf.

Examples

#------------------------------------------------------------------------
# Explore relationship between the top two predictors from veteran data.

data(veteran, package = "randomSurvivalForest") 
v.out <- rsf(Survrsf(time,status)~., veteran, ntree = 1000, forest = TRUE)
find.interaction(v.out, npred = 2, nrep=1)

## Not run: 
#------------------------------------------------------------------------
# All pairwise interactions: PBC data.
# Use fast approximation to speed up computations.

data(pbc, package = "randomSurvivalForest") 
rsf.out <- rsf(Survrsf(days,status)~., pbc, ntree = 1000, forest = TRUE)
find.interaction(rsf.out, nrep=3, rough=T)
## End(Not run)


[Package randomSurvivalForest version 3.5.1 Index]