interaction.rsf {randomSurvivalForest} | R Documentation |
Calculate variable importance (VIMP) for a single variable or group of variables.
interaction.rsf(object, predictorNames = NULL, subset = NULL, joint = TRUE, rough = FALSE, importance = c("randomsplit", "permute", "none")[1], seed = NULL, do.trace = FALSE, ...)
object |
An object of class (rsf, grow) or (rsf,
forest) . Note: forest =TRUE must be used in the
original rsf call. |
predictorNames |
Character vector of x-variable names to be considered. If NULL (the default) all variables are used. Only x-variables listed in the object predictor matrix will be used. |
subset |
Indices indicating which rows of the predictor
matrix to be used (note: this applies to the object
predictor matrix, predictors ). Default is to use all rows. |
joint |
Should joint-VIMP or individual VIMP be calculated? See details below. |
rough |
Logical value indicating whether fast approximation should be used. Default is FALSE. |
importance |
Method used to compute variable importance (VIMP). |
seed |
Seed for random number generator. Must be a negative integer (the R wrapper handles incorrectly set seed values). |
do.trace |
Logical. Should trace output be enabled? Default is
FALSE. Integer values can also be passed. A positive value
causes output to be printed each do.trace iteration. |
... |
Further arguments passed to or from other methods. |
Using a previously grown forest, and restricting the data to that
indicated by subset
, calculate the VIMP for variables listed
in predictorNames
. If joint
=TRUE, a joint-VIMP is
calculated. The joint-VIMP is the importance for the group of
variables, when the group is perturbed simultaneously. If
joint
=FALSE, the VIMP for each variable considered separately
is calculated.
Depending upon the option importance
, VIMP is calculated
either by random daugther assignment, by random permutation of
the variable(s), or none (no perturbing).
A list with the following components:
err.rate |
OOB error rate for the (unperturbed) ensemble restricted to the subsetted data. |
err.perturb.rate |
OOB error rate for the perturbed ensemble
restricted to the subsetted data. Either a vector
or a single number depending upon the option joint . |
importance |
Variable importance (VIMP). Either a vector
or a single number depending upon the option joint . |
Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu
H. Ishwaran (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.
find.interaction
.
#------------------------------------------------------------------------ # Example of paired-VIMP. # Veteran data. data(veteran, package = "randomSurvivalForest") v.out <- rsf(Survrsf(time,status)~., veteran, ntree = 1000, forest = TRUE) interaction.rsf(v.out, c("karno","celltype"))$importance ## Not run: #------------------------------------------------------------------------ # Individual VIMP for data restricted to events only. # PBC data. data(pbc, package = "randomSurvivalForest") rsf.out <- rsf(Survrsf(days,status)~., pbc, ntree = 1000, forest = TRUE) o.r <- rev(order(rsf.out$importance)) VIMP <- rsf.out$importance[o.r] VIMP.events <- rep(0, length(VIMP)) names(VIMP.events) <- names(VIMP) events <- which(rsf.out$cens == 1) VIMP.events <- interaction.rsf(rsf.out, names(VIMP), events, joint = FALSE)$importance VIMP.all <- as.data.frame(cbind(VIMP.events = VIMP.events, VIMP = VIMP)) print(round(VIMP.all, 3)) #------------------------------------------------------------------------ # Estimate variability of VIMP in two ways (PBC data): # (i) Monte Carlo: Estimates variability of the procedure # (ii) Bootstrap: Estimates statistical variability data(pbc, package = "randomSurvivalForest") rsf.out <- rsf(Survrsf(days,status)~., pbc, ntree = 1000, forest = TRUE) o.r <- rev(order(rsf.out$importance)) VIMP <- rsf.out$importance[o.r] subset.index <- 1:nrow(rsf.out$predictors) VIMP.mc <- VIMP.boot <- NULL for (k in 1:100) { cat("iteration:", k , "\n") VIMP.mc <- cbind(VIMP.mc, interaction.rsf(rsf.out, names(VIMP), joint = FALSE)$importance) VIMP.boot <- cbind(VIMP.boot, interaction.rsf(rsf.out, names(VIMP), subset = sample(subset.index, replace = TRUE), joint = FALSE)$importance) } VIMP.mc <- as.data.frame(cbind(VIMP.mean = apply(VIMP.mc, 1, mean), VIMP.sd = apply(VIMP.mc, 1, sd))) VIMP.boot <- as.data.frame(cbind(VIMP.mean = apply(VIMP.boot, 1, mean), VIMP.sd = apply(VIMP.boot, 1, sd))) rownames(VIMP.mc) <- rownames(VIMP.boot) <- names(VIMP) print(round(VIMP.mc, 3)) print(round(VIMP.boot, 3)) ## End(Not run)