predict.rsf {randomSurvivalForest}R Documentation

Random Survival Forest Prediction

Description

Prediction on new data using Random Survival Forests.

Usage

  predict.rsf(object = NULL,
              newdata = NULL,
              importance = TRUE,
              na.action = c("na.omit", "na.impute")[1],
              proximity = FALSE,
              seed = NULL,
              do.trace = FALSE,
              ...)

Arguments

object An object of class (rsf, grow) or (rsf, forest). Note that forest=TRUE must be used in the original rsf call for prediction to work.
newdata Data frame containing test data. Missing values allowed.
importance Logical. Should importance of variables be estimated? Only applies when test data contains outcomes.
na.action Action taken if the data contain NA's. Possible values are na.omit, which removes the entire record if even one of its entries is NA, and na.impute, which imputes the test data. See details below.
proximity Logical. Should proximity measure between test observations be calculated? Can be large. Default is FALSE.
seed Seed for random number generator. Must be a negative integer (the R wrapper handles incorrectly set seed values).
do.trace Logical. Should trace output be enabled? Default is FALSE. Integer values can also be passed. A positive value causes output to be printed each do.trace iteration.
... Further arguments passed to or from other methods.

Details

predict.rsf takes a test data set, drops it down the forest grown from the training data, and then computes an ensemble cumulative hazard function (CHF). CHF's are computed for all individuals in the test data set at the unique death time points of the original grow (training) data. The error rate and importance values for variables are computed on the test data if outcome information is available. Setting the option na.action to na.impute imputes missing test data (x-variables or outcomes) using the forest grown on the training data (Ishwaran et al. 2007). Only training data is used in imputing test data.

Value

An object of class (rsf, predict), which is a list with the following components:

call The original grow call to rsf.
forest The grow forest.
ntree Number of trees in the grow forest.
leaf.count Number of terminal nodes for each tree in the grow forest. Vector of length ntree.
timeInterest Sorted unique event times from grow (training) data. Ensemble values given for these time points only.
n Sample size of test data (depends upon NA's, see na.action).
ndead Number of deaths in test data (can be NULL).
time Vector of survival times from test data (can be NULL).
cens Vector of censoring indicators from test data (can be NULL).
predictorNames Character vector of variable names.
predictors Test data matrix of x-variables used for prediction.
ensemble Matrix containing the ensemble CHF for the test data. Each row corresponds to a test data individual's CHF evaluated at each of the time points in timeInterest.
mortality Vector containing ensemble mortality for each individual in the test data. Ensemble mortality values should be interpreted in terms of total number of training deaths.
err.rate Vector of length ntree containing error rate of the test data. Can be NULL.
importance Importance measure of each variable in the test data. Can be NULL.
proximity If proximity=TRUE, a matrix recording proximity of the inputs from test data is computed. Value returned is a vector of the lower diagonal of the matrix. Use plot.proximity() to extract this information.
imputedIndv Vector of indices of records in test data with missing values. Can be NULL.
imputedData Matrix of imputed test data. First two columns are censoring and survival time, respectively. The remaining columns are the x-variables. Row i contains imputed outcomes and x-variables for row imputedIndv[i] of predictors. Can be NULL.

Note

The key deliverable is the matrix ensemble which contains the ensemble CHF for each individual in the test data evaluated at a set of distinct time points.

Author(s)

Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu

References

H. Ishwaran, U.B. Kogalur, E.H. Blackstone and M.S. Lauer (2007). Random Survival Forests. Cleveland Clinic Technical Report.

L. Breiman (2001). Random forests, Machine Learning, 45:5-32.

See Also

rsf, print.rsf, plot.ensemble, plot.variable, plot.error, plot.proximity, pmml_to_rsf, rsf_to_pmml.

Examples

data(veteran, package = "randomSurvivalForest")
veteran.out <- rsf(Survrsf(time, status)~., forest = TRUE, data = veteran)
baseForest <- veteran.out$forest
veteran.pred1 <- predict.rsf(baseForest, veteran, proximity = FALSE)
veteran.pred2 <- predict.rsf(veteran.out, veteran, proximity = TRUE)

[Package randomSurvivalForest version 3.0.1 Index]