plot.variable {randomSurvivalForest} | R Documentation |
Plot of ensemble mortality, predicted survival, or predicted survival time against a given x-variable. Users can select between marginal and partial plots.
plot.variable(x, plots.per.page = 4, granule = 5, sorted = TRUE, type = c("mort", "rel.freq", "surv", "time")[1], partial = FALSE, predictorNames = NULL, npred = NULL, npts = 25, subset = NULL, percentile = 50, ...)
x |
An object of class (rsf, grow) or (rsf,
predict) . |
plots.per.page |
Integer value controlling page layout. |
granule |
Integer value controlling whether a plot for a specific variable should be given as a boxplot or scatter plot. Larger values coerce boxplots. |
sorted |
Should variables be sorted by importance values (only applies if importance values are available)? Default is TRUE. |
type |
Select type of value to be plotted on the vertical axis. See details. |
partial |
Logical. Should partial plots be created? Default is FALSE. |
predictorNames |
Character vector of x-variable names. Only these variables will be plotted. Default is all. |
npred |
Number of variables to be plotted (only applies when
predictorNames =NULL). Default is all. |
npts |
Maximum number of points used when generating partial plots for continuous variables. |
subset |
Indices indicating which rows of the predictor matrix
to be used (note: this applies to the processed predictor
matrix, predictors of the object). Default is to use all
rows. |
percentile |
Percentile of follow time used for plotting predicted survival. See details below. |
... |
Further arguments passed to or from other methods. |
Either mortality, relative frequency of mortality, predicted
survival, or predicted survival times are plotted on the vertical
axis (y-value) against x-variables on the horizontal axis. The
choice of x-variables are specified by predictorNames
or
using npred
. The choice of y-value is controlled by
type
. There are 4 different choices: (1) mort
is
ensemble mortality; (2) rel.freq
is standardized mortality;
(3) surv
is predicted survival (predicted survival at a given
time point; the default is the median follow up time, but can be
controlled using the option percentile
); (4) time
is
the predicted survival time (this last option only applies to
partial plots, however). For continuous variables, points are
colored so that blue corresponds to events, whereas black points
represent censored observations.
Ensemble mortality values should be interpreted in terms of total
number of deaths. For example, if individual i
has a
mortality value of 100, then if all individuals were the same as
i
, we would expect to find 100 deaths on average in the data.
If type
is set to rel.freq
, then mortality values are
divided by an adjusted sample size, defined as the maximum of the
sample size and the maximum mortality value. Standardized mortality
values do not indicate total deaths, but rather relative mortality.
Partial plots are created when partial
=TRUE. Interpretation
for these are different than marginal plots. The y-value for a
variable X, evaluated at X=x, is
tilde{f}(x) = frac{1}{n} sum_{i=1}^n hat{f}(x, x_{i,O}),
where x_{i,O} represents the value for all other variables
other than X for individual i and hat{f} is the
predicted value. Generating partial plots can be very slow.
Choosing a small value for npts
can speed up computational
times as this restricts the number of distinct x values used
in computing tilde{f}.
For continuous variables, red points are used to indicate partial values and dashed red lines represent a lowess smoothed error bar of +/- two standard errors. Black dashed line is the lowess estimate of the partial values. For discrete variables, partial values are indicated using boxplots with whiskers extending out approximately two standard errors from the mean. Standard errors are meant only to be a guide and should be interpreted with caution.
Partial plots can be slow. Setting type to time
can greatly
speed things up. Setting npts
to a smaller number should
also be tried.
For competing risk analyses plots correspond to unconditional values
(i.e., they are non-event specific). Use competing.risk
for
event-specific curves and for a more comprehensive analysis in such
cases.
Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur kogalurshear@gmail.com
H. Ishwaran, U.B. Kogalur (2007). Random survival forests for R, Rnews, 7/2:25-31.
J.H. Friedman (2001). Greedy function approximation: a gradient boosting machine, Ann. of Stat., 5:1189-1232.
A. Liaw and M. Wiener (2002). Classification and regression by randomForest, R News, 2:18-22.
competing.risk
,
rsf
,
predict.rsf
.
#------------------------------------------------------------------------ # Some examples: veteran data. data(veteran, package = "randomSurvivalForest") v.out <- rsf(Survrsf(time,status)~., veteran, forest = TRUE, nsplit = 10, ntree = 1000) plot.variable(v.out, plots.per.page = 3) plot.variable(v.out, plots.per.page = 2, predictorNames = c("trt", "karno", "age")) plot.variable(v.out, type = "surv", npred = 1, percentile = 50) plot.variable(v.out, type = "rel.freq", partial = TRUE, plots.per.page = 2, npred=3) ## Not run: #------------------------------------------------------------------------ # Fast partial plots using 'time' type. # Top 8 predictors from PBC data. data(pbc, package = "randomSurvivalForest") pbc.out <- rsf(Survrsf(days,status)~., pbc, ntree = 1000, nsplit = 3, forest = TRUE) plot.variable(pbc.out, type = "time", partial = TRUE, npred=8) ## End(Not run)