predict.rsf {randomSurvivalForest}R Documentation

Random Survival Forest Prediction

Description

Prediction on test data using Random Survival Forests.

Usage

  predict.rsf(object = NULL,
              test = NULL,
              importance = c("randomsplit", "permute", "none")[1],
              na.action = c("na.omit", "na.impute")[1],
              proximity = FALSE,
              seed = NULL,
              do.trace = FALSE,
              ...)

Arguments

object An object of class (rsf, grow) or (rsf, forest). Note that forest=TRUE must be used in the original rsf call for prediction to work.
test Data frame containing test data. Missing values allowed.
importance Method used to compute variable importance (VIMP). Only applies when test data contains outcomes.
na.action Action to be taken if the data contain NA's. Possible values are na.omit, which removes the entire record if even one of its entries is NA, and na.impute, which imputes the test data. See details below.
proximity Logical. Should proximity measure between test observations be calculated? Can be large. Default is FALSE.
seed Seed for random number generator. Must be a negative integer (the R wrapper handles incorrectly set seed values).
do.trace Logical. Should trace output be enabled? Default is FALSE. Integer values can also be passed. A positive value causes output to be printed each do.trace iteration.
... Further arguments passed to or from other methods.

Details

predict.rsf takes a test data set, drops it down the forest grown from the training data, and computes an ensemble cumulative hazard function (CHF). CHF's are calculated for each individual in the test data for all unique death time points from the original grow (training) data. Overall error rate and the VIMP for each variable are computed for the test data if outcome information is available. Setting na.action=na.impute imputes missing test data (x-variables or outcomes) using the forest grown from the training data (Ishwaran et al. 2008). Only training data is used in imputing test data to avoid biasing error rates.

Value

An object of class (rsf, predict), which is a list with the following components:

call The original grow call to rsf.
forest The grow forest.
ntree Number of trees in the grow forest.
leaf.count Number of terminal nodes for each tree in the grow forest. Vector of length ntree.
timeInterest Sorted unique event times from grow (training) data. Ensemble values given for these time points only.
n Sample size of test data (depends upon NA's, see na.action).
ndead Number of deaths in test data (can be NULL).
time Vector of survival times from test data (can be NULL).
cens Vector of censoring indicators from test data (can be NULL).
predictorNames Character vector of variable names.
predictors Data frame comprising x-variables used for prediction.
ensemble Matrix containing the ensemble CHF for the test data. Each row corresponds to a test data individual's CHF evaluated at each of the time points in timeInterest.
mortality Vector containing ensemble mortality for each individual in the test data. Ensemble mortality values should be interpreted in terms of total number of training deaths.
err.rate Vector of length ntree containing error rate of the test data. Can be NULL.
importance VIMP of each variable in the test data. Can be NULL.
proximity If proximity=TRUE, a matrix recording proximity of the inputs from test data is computed. Value returned is a vector of the lower diagonal of the matrix. Use plot.proximity() to extract this information.
imputedIndv Vector of indices of records in test data with missing values. Can be NULL.
imputedData Data frame comprising imputed test data. First two columns are censoring and survival time, respectively. The remaining columns are the x-variables. Row i contains imputed outcomes and x-variables for row imputedIndv[i] of predictors. Can be NULL.

Note

The key deliverable is the matrix ensemble which contains the ensemble CHF for each individual in the test data evaluated at a set of distinct time points.

Author(s)

Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu

References

L. Breiman (2001). Random forests, Machine Learning, 45:5-32.

H. Ishwaran, U.B. Kogalur, E.H. Blackstone and M.S. Lauer (2008). Random survival forests, To appear in Ann. App. Statist..

H. Ishwaran, U.B. Kogalur (2007). Random survival forests for R, Rnews, 7/2:25-31.

See Also

rsf, print.rsf, plot.ensemble, plot.variable, plot.error, plot.proximity, pmml2rsf, rsf2pmml.

Examples

data(veteran, package = "randomSurvivalForest")
train.pt <- sample(1:nrow(veteran), round(nrow(veteran)*0.80))
veteran.out <- rsf(Survrsf(time, status) ~ ., forest = TRUE,
                   data = veteran[train.pt , ])
veteran.pred <- predict.rsf(veteran.out, veteran[-train.pt , ])

[Package randomSurvivalForest version 3.5.1 Index]