find.interaction {randomSurvivalForest}R Documentation

Find Interactions Between Pairs of Variables

Description

Test for pairwise interactions between variables by comparing pairwise importance values to additive individual importance values.

Usage

    find.interaction(object,
                  predictorNames = NULL,
                  sorted = TRUE,
                  npred = NULL,
                  subset = NULL, 
                  nrep = 1,
                  rough = FALSE,
                  importance = c("randomsplit", "permute")[1],
                  ...)

Arguments

object An object of class (rsf, grow) or (rsf, forest). Note: forest=TRUE must be used in the original rsf call.
predictorNames Character vector of variable names to be considered. Default is to use all variables.
sorted Should variables be sorted by importance values? Only applies when predictorNames=NULL.
npred Use the first npred variables as ordered by VIMP (only applies when predictorNames=NULL). Default uses all variables.
subset An index vector indicating which rows should be used. Default is to use all the data.
nrep Number of Monte Carlo replicates.
rough Logical value indicating whether fast approximation should be used. Default is FALSE.
importance Method used to compute variable importance (VIMP).
... Further arguments passed to or from other methods.

Details

Using a previously grown forest, identify pairwise interactions for all pairs of variables from a specified list. Two variables are paired and their paired VIMP calculated (refered to as 'Paired' importance). The VIMP for each separate variable is also calculated. The sum of these two values is refered to as 'Additive' importance. A large positive or negative difference between 'Paired' and 'Additive' indicates an association worth pursuing if the VIMP's for each variable are reasonably large (Ishwaran, 2007).

Depending on the size of the data, computations might be slow. Users should consider setting npred to a smaller number, or restricting the analysis to a subset of the data, if that is the case.

If nrep is greater than 1, the analysis is repeated nrep times and results averaged over the replications.

find.interaction calls the lower level function interaction.rsf. For programming only, users may consider doing likewise.

Value

Invisibly, the interaction table.

Author(s)

Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu

References

H. Ishwaran (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.

See Also

interaction.rsf.

Examples

data(veteran, package = "randomSurvivalForest") 
v.out <- rsf(Survrsf(time,status)~., veteran, ntree = 1000, forest = TRUE)
find.interaction(v.out, npred = 2, nrep=1)

## Not run: 
# All pairwise interactions: PBC data.
# Use fast approximation to speed up computations.
data(pbc, package = "randomSurvivalForest") 
rsf.out <- rsf(Survrsf(days,status)~., pbc, ntree = 1000, forest = TRUE)
find.interaction(rsf.out, nrep=3, rough=T)
## End(Not run)


[Package randomSurvivalForest version 3.2.3 Index]