predict.rsf {randomSurvivalForest} | R Documentation |
Prediction on test data using Random Survival Forests.
predict.rsf(object = NULL, test = NULL, importance = c("randomsplit", "permute", "none")[1], na.action = c("na.omit", "na.impute")[1], proximity = FALSE, seed = NULL, do.trace = FALSE, ...)
object |
An object of class (rsf, grow) or (rsf,
forest) . Note that forest =TRUE must be used in the
original rsf call for prediction to work. |
test |
Data frame containing test data. Missing values allowed. |
importance |
Method used to compute variable importance (VIMP). Only applies when test data contains outcomes. |
na.action |
Action to be taken if the data contain NA's. Possible
values are na.omit , which removes the entire record if
even one of its entries is NA, and na.impute , which
imputes the test data. See details below. |
proximity |
Logical. Should proximity measure between test observations be calculated? Can be large. Default is FALSE. |
seed |
Seed for random number generator. Must be a negative integer (the R wrapper handles incorrectly set seed values). |
do.trace |
Logical. Should trace output be enabled? Default is
FALSE. Integer values can also be passed. A positive value
causes output to be printed each do.trace iteration. |
... |
Further arguments passed to or from other methods. |
predict.rsf
takes a test data set, drops it down the forest
grown from the training data, and computes an ensemble cumulative
hazard function (CHF). CHF's are calculated for each individual in
the test data for all unique death time points from the original grow
(training) data. Overall error rate and the VIMP for each variable
are computed for the test data if outcome information is available.
Setting na.action
=na.impute
imputes missing test data
(x-variables or outcomes) using the forest grown from the training
data (Ishwaran et al. 2008). Only training data is used in imputing
test data to avoid biasing error rates.
An object of class (rsf, predict)
, which is a list with the
following components:
call |
The original grow call to rsf . |
forest |
The grow forest. |
ntree |
Number of trees in the grow forest. |
leaf.count |
Number of terminal nodes for each tree in the
grow forest. Vector of length ntree . |
timeInterest |
Sorted unique event times from grow (training) data. Ensemble values given for these time points only. |
n |
Sample size of test data (depends upon NA's, see na.action ). |
ndead |
Number of deaths in test data (can be NULL). |
time |
Vector of survival times from test data (can be NULL). |
cens |
Vector of censoring indicators from test data (can be NULL). |
predictorNames |
Character vector of variable names. |
predictors |
Data frame comprising x-variables used for prediction. |
ensemble |
Matrix containing the ensemble CHF for the test data. Each
row corresponds to a test data individual's CHF evaluated at
each of the time points in timeInterest . |
mortality |
Vector containing ensemble mortality for each individual in the test data. Ensemble mortality values should be interpreted in terms of total number of training deaths. |
err.rate |
Vector of length ntree containing error
rate of the test data. Can be NULL. |
importance |
VIMP of each variable in the test data. Can be NULL. |
proximity |
If proximity =TRUE, a matrix recording
proximity of the inputs from test data is computed. Value
returned is a vector of the lower diagonal of the matrix. Use
plot.proximity() to extract this information. |
imputedIndv |
Vector of indices of records in test data with missing values. Can be NULL. |
imputedData |
Data frame comprising imputed test data. First
two columns are censoring and survival time, respectively. The
remaining columns are the x-variables. Row i contains imputed
outcomes and x-variables for row imputedIndv [i] of
predictors . Can be NULL. |
The key deliverable is the matrix ensemble
which contains the
ensemble CHF for each individual in the test data evaluated at a set
of distinct time points.
Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu
L. Breiman (2001). Random forests, Machine Learning, 45:5-32.
H. Ishwaran, U.B. Kogalur, E.H. Blackstone and M.S. Lauer (2008). Random survival forests, To appear in Ann. App. Statist..
H. Ishwaran, U.B. Kogalur (2007). Random survival forests for R, Rnews, 7/2:25-31.
rsf
,
print.rsf
,
plot.ensemble
,
plot.variable
,
plot.error
,
plot.proximity
,
pmml2rsf
,
rsf2pmml
.
data(veteran, package = "randomSurvivalForest") train.pt <- sample(1:nrow(veteran), round(nrow(veteran)*0.80)) veteran.out <- rsf(Survrsf(time, status) ~ ., forest = TRUE, data = veteran[train.pt , ]) veteran.pred <- predict.rsf(veteran.out, veteran[-train.pt , ])