interaction.rsf {randomSurvivalForest} | R Documentation |
Calculate variable importance (VIMP) for a single variable or group of variables.
interaction.rsf(object, predictorNames = NULL, subset = NULL, joint = TRUE, rough = FALSE, importance = c("randomsplit", "permute", "none")[1], seed = NULL, do.trace = FALSE, ...)
object |
An object of class (rsf, grow) or (rsf,
forest) . Note: forest =TRUE must be used in the
original rsf call. |
predictorNames |
Character vector of variable names to be considered. This must be specified. |
subset |
An index vector indicating which rows should be used. Default is to use all the data. |
joint |
Should joint-VIMP or individual VIMP be calculated? See details below. |
rough |
Logical value indicating whether fast approximation should be used. Default is FALSE. |
importance |
Method used to compute variable importance (VIMP). |
seed |
Seed for random number generator. Must be a negative integer (the R wrapper handles incorrectly set seed values). |
do.trace |
Logical. Should trace output be enabled? Default is
FALSE. Integer values can also be passed. A positive value
causes output to be printed each do.trace iteration. |
... |
Further arguments passed to or from other methods. |
Using a previously grown forest, and restricting the data to that
indicated by subset
, calculate the VIMP for variables listed
in predictorNames
. If joint
=TRUE, a joint-VIMP is
calculated. The joint-VIMP is the importance for the group of
variables, when the group is perturbed simultaneously. If
joint
=FALSE, the VIMP for each variable considered separately
is calculated.
Depending upon the option importance
, VIMP is calculated
either by random daugther assignment, by random permutation of
the variable(s), or none (no perturbing).
A list with the following components:
err.rate |
Vector of length ntree containing OOB error
rates for the (unperturbed) ensemble restricted to the subsetted
data. |
importance |
Variable importance (VIMP). Either a vector
or a single number depending upon the option joint . |
Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu
H. Ishwaran (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.
find.interaction
.
# Example of paired-VIMP. # Veteran data. data(veteran, package = "randomSurvivalForest") v.out <- rsf(Survrsf(time,status)~., veteran, ntree = 1000, forest = TRUE) interaction.rsf(v.out, c("karno","celltype"))$importance ## Not run: # Individual VIMP for data restricted to events only. # PBC data. data(pbc, package = "randomSurvivalForest") rsf.out <- rsf(Survrsf(days,status)~., pbc, ntree = 1000, forest = TRUE) o.r <- rev(order(rsf.out$importance)) VIMP <- rsf.out$importance[o.r] VIMP.events <- rep(0, length(VIMP)) names(VIMP.events) <- names(VIMP) events <- which(rsf.out$cens == 1) VIMP.events <- interaction.rsf(rsf.out, names(VIMP), events, joint = FALSE)$importance VIMP.all <- as.data.frame(cbind(VIMP.events = VIMP.events, VIMP = VIMP)) print(round(VIMP.all, 3)) # PBC data again. # Monte Carlo estimates for VIMP. # Bootstrap estimates for VIMP. VIMP.MC <- VIMP.BOOT <- NULL for (k in 1:100) { VIMP.MC <- cbind(VIMP.MC, interaction.rsf(rsf.out, names(VIMP), joint = FALSE)$importance) VIMP.BOOT <- cbind(VIMP.BOOT, interaction.rsf(rsf.out, names(VIMP), subset = sample(1:dim(pbc)[1], replace = TRUE), joint = FALSE)$importance) } VIMP.MC <- as.data.frame(cbind(VIMP.mean = apply(VIMP.MC, 1, mean), VIMP.sd = apply(VIMP.MC, 1, sd))) VIMP.BOOT <- as.data.frame(cbind(VIMP.mean = apply(VIMP.BOOT, 1, mean), VIMP.sd = apply(VIMP.BOOT, 1, sd))) rownames(VIMP.MC) <- rownames(VIMP.BOOT) <- names(VIMP) print(round(VIMP.MC, 3)) print(round(VIMP.BOOT, 3)) ## End(Not run)