predict.rsf {randomSurvivalForest}R Documentation

Random Survival Forest Prediction

Description

Prediction on new data using Random Survival Forests.

Usage

  predict.rsf(object = NULL,
              newdata = NULL,
              proximity = FALSE, 
              do.trace = FALSE,
              ...)

Arguments

object An object of class (rsf, grow) or (rsf, forest). Note that forest=TRUE must be used in the original rsf call for prediction to work.
newdata Data frame containing test data. Missing NA values are not encouraged. These are dealt with by removing the entire record if even one of its entries is NA.
proximity Logical. Should proximity measure between test observations be calculated? Can be huge. Default is FALSE.
do.trace Logical. Should trace output be enabled? Default is FALSE. Integer values can also be passed. A positive value causes output to be printed each do.trace iteration.
... Further arguments passed to or from other methods.

Details

predict.rsf takes a test data set, drops it down the forest grown from the training data, and then computes ensemble cumulative hazard functions. CHF's are predicted for all individuals in the test data set at the unique death time points of the original grow (training) data.

Value

An object of class (rsf, predict), which is a list with the following components:

call The original grow call to rsf.
forest The grow forest.
ntree Number of trees in grow forest.
leaf.count Number of terminal nodes for each tree in the grow forest. Vector of length ntree.
timeInterest Sorted unique event times from grow data. Ensemble values given for these time points only.
n Sample size of test data.
ndead Number of deaths in test data (can be NULL).
Time Vector recording survival times from test data (can be NULL).
Cens Vector recording censoring information from test data (can be NULL).
predictorNames Character vector of predictor names.
predictors Test data matrix of predictors used for prediction.
ensemble Matrix of the ensemble cumulative hazard function for the test data. Each row corresponds to a test data individual's CHF evaluated at each of the time points in timeInterest.
mortality A vector representing the estimated ensemble mortality for each individual in the test data. Ensemble mortality values should be interpreted in terms of total number of deaths.
err.rate Vector of length ntree containing error rate of the test data. Can be NULL.
proximity If proximity=TRUE, a matrix recording proximity of the inputs from test data is computed. Value returned is a vector of the lower diagonal of the matrix. Use plot.proximity() to extract this information.

Note

The key deliverable is the matrix ensemble which contains the estimated ensemble cumulative hazard function for each individual in the test data evaluated at a set of distinct time points.

Author(s)

Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu

References

H. Ishwaran and Udaya B. Kogalur (2006). Random Survival Forests. Cleveland Clinic Technical Report.

L. Breiman (2001). Random forests, Machine Learning, 45:5-32.

See Also

rsf, print.rsf, plot.ensemble, plot.variable, plot.error, plot.proximity, pmml_to_rsf, rsf_to_pmml.

Examples

data(veteran, package = "randomSurvivalForest")
veteran.out <- rsf(Survrsf(time, status)~., forest = TRUE, data = veteran)
baseForest <- veteran.out$forest
veteran.pred1 <- predict.rsf(baseForest, veteran, proximity = FALSE)
veteran.pred2 <- predict.rsf(veteran.out, veteran, proximity = TRUE)

[Package randomSurvivalForest version 2.1.0 Index]