predict.rsf {randomSurvivalForest} | R Documentation |
Prediction on new data using Random Survival Forests.
predict.rsf(object = NULL, newdata = NULL, importance = TRUE, na.action = c("na.omit", "na.impute")[1], proximity = FALSE, seed = NULL, do.trace = FALSE, ...)
object |
An object of class (rsf, grow) or (rsf,
forest) . Note that forest =TRUE must be used in the
original rsf call for prediction to work. |
newdata |
Data frame containing test data. Missing values allowed. |
importance |
Logical. Should importance of variables be estimated? Only applies when test data contains outcomes. |
na.action |
Action taken if the data contain NA's. Possible
values are na.omit , which removes the entire record if
even one of its entries is NA, and na.impute , which
imputes the test data. See details below. |
proximity |
Logical. Should proximity measure between test observations be calculated? Can be large. Default is FALSE. |
seed |
Seed for random number generator. Must be a negative integer (the R wrapper handles incorrectly set seed values). |
do.trace |
Logical. Should trace output be enabled? Default is
FALSE. Integer values can also be passed. A positive value
causes output to be printed each do.trace iteration. |
... |
Further arguments passed to or from other methods. |
predict.rsf
takes a test data set, drops it down the forest
grown from the training data, and then computes an ensemble
cumulative hazard function (CHF). CHF's are computed for all
individuals in the test data set at the unique death time points of
the original grow (training) data. The error rate and importance
values for variables are computed on the test data if outcome
information is available. Setting the option na.action
to
na.impute
imputes missing test data (x-variables or outcomes)
using the forest grown on the training data (Ishwaran et al. 2007).
Only training data is used in imputing test data.
An object of class (rsf, predict)
, which is a list with the
following components:
call |
The original grow call to rsf . |
forest |
The grow forest. |
ntree |
Number of trees in the grow forest. |
leaf.count |
Number of terminal nodes for each tree in the
grow forest. Vector of length ntree . |
timeInterest |
Sorted unique event times from grow (training) data. Ensemble values given for these time points only. |
n |
Sample size of test data (depends upon NA's, see na.action ). |
ndead |
Number of deaths in test data (can be NULL). |
time |
Vector of survival times from test data (can be NULL). |
cens |
Vector of censoring indicators from test data (can be NULL). |
predictorNames |
Character vector of variable names. |
predictors |
Test data matrix of x-variables used for prediction. |
ensemble |
Matrix containing the ensemble CHF for the test data. Each
row corresponds to a test data individual's CHF evaluated at
each of the time points in timeInterest . |
mortality |
Vector containing ensemble mortality for each individual in the test data. Ensemble mortality values should be interpreted in terms of total number of training deaths. |
err.rate |
Vector of length ntree containing error
rate of the test data. Can be NULL. |
importance |
Importance measure of each variable in the test data. Can be NULL. |
proximity |
If proximity =TRUE, a matrix recording
proximity of the inputs from test data is computed. Value
returned is a vector of the lower diagonal of the matrix. Use
plot.proximity() to extract this information. |
imputedIndv |
Vector of indices of records in test data with missing values. Can be NULL. |
imputedData |
Matrix of imputed test data. First two columns
are censoring and survival time, respectively. The remaining
columns are the x-variables. Row i contains imputed outcomes
and x-variables for row imputedIndv [i] of
predictors . Can be NULL. |
The key deliverable is the matrix ensemble
which contains the
ensemble CHF for each individual in the test data evaluated at a set
of distinct time points.
Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu
H. Ishwaran, U.B. Kogalur, E.H. Blackstone and M.S. Lauer (2007). Random Survival Forests. Cleveland Clinic Technical Report.
L. Breiman (2001). Random forests, Machine Learning, 45:5-32.
rsf
,
print.rsf
,
plot.ensemble
,
plot.variable
,
plot.error
,
plot.proximity
,
pmml_to_rsf
,
rsf_to_pmml
.
data(veteran, package = "randomSurvivalForest") veteran.out <- rsf(Survrsf(time, status)~., forest = TRUE, data = veteran) baseForest <- veteran.out$forest veteran.pred1 <- predict.rsf(baseForest, veteran, proximity = FALSE) veteran.pred2 <- predict.rsf(veteran.out, veteran, proximity = TRUE)