plot.variable {randomSurvivalForest}R Documentation

Plot of Ensemble Survival Effect of Predictors

Description

Plot of ensemble mortality for each predictor. Users can select between marginal and partial plots.

Usage

    plot.variable(x,
                  plots.per.page = 4,
                  granule = 5,
                  sort = TRUE, 
                  partial = FALSE,
                  predictorNames = NULL,
                  n.pred = NULL,
                  n.pts = 25,
                  ...)

Arguments

x An object of class (rsf, grow) or (rsf, predict).
plots.per.page Integer value controlling page layout.
granule Integer value controlling whether a plot for a specific predictor should be given as a boxplot or scatter plot. Larger values coerce boxplots.
sort Should predictors be sorted by importance values (only applies if importance values are available)? Default is TRUE.
partial Logical. Should partial plots be created? Default is FALSE.
predictorNames Character vector of predictor names. Only these predictors will be plotted. Default is all.
n.pred Number of predictors to be plotted. Default is all.
n.pts Maximum number of points used when generating partial plots for continuous predictors.
... Further arguments passed to or from other methods.

Details

Ensemble mortality is plotted against the value of a predictor. Ensemble mortality values (vertical axis) should be interpreted in terms of total number of deaths. For example, if individual i has a mortality value of 100, then if all individuals had the same predictor as i, there would be 100 deaths in the dataset (on average).

Default is to create marginal plots. Thus, each point represents the estimated mortality of an individual i against the value for i's predictor. For continuous predictors, points are colored so that blue corresponds to events, whereas black points represent censored observations.

Partial plots are created when partial=TRUE. In this case, the mortality value being plotted for a predictor X evaluated at X=x is

tilde{f}(x) = frac{1}{n} sum_{i=1}^n hat{f}(x, x_{i,O}),

where x_{i,O} represents the value for all other predictors other than X for individual i and hat{f} is the ensemble mortality predictor. Generating partial plots can be very slow. Choosing a small value for n.pts can speed up computational times as this restricts the number of distinct x values used in computing tilde{f}.

For continuous predictors, red points are used to indicate partial values and dashed red lines represent a lowess smoothed error bar of +/- two standard errors. Black dashed line is the lowess estimate of the partial values. For discrete predictors, partial values are indicated using boxplots with whiskers extending out approximately two standard errors from the mean. Standard errors are meant only to be a guide and should be interpreted with caution.

Author(s)

Hemant Ishwaran hemant.ishwaran@gmail.com and Udaya B. Kogalur ubk2101@columbia.edu

References

H. Ishwaran and Udaya B. Kogalur (2006). Random Survival Forests. Cleveland Clinic Technical Report.

J.H. Friedman (2001). Greedy function approximation: a gradient boosting machine, Ann. of Stat., 5:1189-1232.

A. Liaw and M. Wiener (2002). Classification and regression by randomForest, R News, 2:18-22.

See Also

rsf, predict.rsf.

Examples

  data(veteran, package = "randomSurvivalForest") 
  v.out <- rsf(Survrsf(time,status)~., veteran, forest = TRUE, ntree = 1000)
  plot.variable(v.out, plots.per.page = 3)
  plot.variable(v.out, plots.per.page = 2, predictorNames = c("trt", "karno", "age"))
  plot.variable(v.out, partial = TRUE, plots.per.page = 2, n.pred=3)

[Package randomSurvivalForest version 2.0.0 Index]