predict.rsf {randomSurvivalForest} | R Documentation |
Prediction on new data using Random Survival Forests.
predict.rsf(object = NULL, newdata = NULL, proximity = FALSE, do.trace = FALSE, ...)
object |
An object of class (rsf, grow) or (rsf,
forest) . Note that forest =TRUE must be used in the
original rsf call for prediction to work. |
newdata |
Data frame containing test data. Missing NA values are not encouraged. These are dealt with by removing the entire record if even one of its entries is NA. |
proximity |
Logical. Should proximity measure between test observations be calculated? Can be huge. Default is FALSE. |
do.trace |
Logical. Should trace output be enabled? Default is
FALSE. Integer values can also be passed. A positive value
causes output to be printed each do.trace iteration. |
... |
Further arguments passed to or from other methods. |
predict.rsf
takes a test data set, drops it down the forest grown from
the training data, and then computes ensemble cumulative hazard
functions. CHF's are predicted for all individuals in the test data
set at the unique death time points of the original grow (training)
data.
An object of class (rsf, predict)
, which is a list with the
following components:
call |
The original grow call to rsf . |
forest |
The grow forest. |
ntree |
Number of trees in grow forest. |
leaf.count |
Number of terminal nodes for each tree in the
grow forest. Vector of length ntree . |
timeInterest |
Sorted unique event times from grow data. Ensemble values given for these time points only. |
n |
Sample size of test data. |
ndead |
Number of deaths in test data (can be NULL). |
Time |
Vector recording survival times from test data (can be NULL). |
Cens |
Vector recording censoring information from test data (can be NULL). |
predictorNames |
Character vector of predictor names. |
predictors |
Test data matrix of predictors used for prediction. |
ensemble |
Matrix of the ensemble cumulative hazard function
for the test data. Each row corresponds to a test data
individual's CHF evaluated at each of the time points in timeInterest . |
mortality |
A vector representing the estimated ensemble mortality for each individual in the test data. Ensemble mortality values should be interpreted in terms of total number of deaths. |
err.rate |
Vector of length ntree containing error
rate of the test data. Can be NULL. |
proximity |
If proximity =TRUE, a matrix recording
proximity of the inputs from test data is computed. Value
returned is a vector of the lower diagonal of the matrix. Use
plot.proximity() to extract this information. |
The key deliverable is the matrix ensemble
which contains the
estimated ensemble cumulative hazard function for each individual in
the test data evaluated at a set of distinct time points.
Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu
H. Ishwaran and Udaya B. Kogalur (2006). Random Survival Forests. Cleveland Clinic Technical Report.
L. Breiman (2001). Random forests, Machine Learning, 45:5-32.
rsf
,
print.rsf
,
plot.ensemble
,
plot.variable
,
plot.error
,
plot.proximity
,
pmml_to_rsf
,
rsf_to_pmml
.
data(veteran, package = "randomSurvivalForest") veteran.out <- rsf(Survrsf(time, status)~., forest = TRUE, data = veteran) baseForest <- veteran.out$forest veteran.pred1 <- predict.rsf(baseForest, veteran, proximity = FALSE) veteran.pred2 <- predict.rsf(veteran.out, veteran, proximity = TRUE)