prodlim {prodlim} | R Documentation |
Nonparametric estimation in event history analysis. Featuring fast algorithms and user friendly syntax adapted from the survival package. The product limit algorithm is used for right censored data; the self-consistency algorithm for interval censored data.
prodlim(formula, data=parent.frame(), subset, na.action, reverse=FALSE, conf.int=0.95, bandwidth=NULL, discrete.level=3, maxiter=1000, grid, tol=7, ml=FALSE)
formula |
A formula whose left hand side is
a Hist object. In some special cases it can also be a
Surv response object, see the details section. The right hand side is as
usual a linear combination of covariates which may
contain at most one continuous factor. Whether or not a covariate is
recognized as continuous or discrete depends on its class and on the
argument discrete.level . The right hand side may also be used
to specify clusters, see the details section.
|
data |
A data.frame in which all the variables of
formula can be interpreted.
|
subset |
Expression identifying a subset of the rows of the data for the fitting process. |
na.action |
A missing-data filter function, applied to the model.frame,
after any subset argument has been used. Default is
options()$na.action .
|
reverse |
For right censored data, if reverse=TRUE then the censoring distribution is estimated. |
conf.int |
The level (between 0 and 1) for two-sided pointwise confidence intervals. Defaults to 0.95. |
bandwidth |
Smoothing parameter for symmetric nearest neighborhoods
based on the values of a continuous covariate. See function
neighborhood for details.
|
discrete.level |
Numeric covariates are treated as factors when their number
of unique values exceeds not discrete.level . Otherwise
the product limit method is applied, in overlapping neighborhoods
according to the bandwidth.
|
maxiter |
Maximal number of iterations to obtain the nonparametric maximum likelihood estimate. Defaults to 1000. |
grid |
When ml=FALSE grid for one-step product limit estimate. Defaults to sorted list of unique left and right endpoints of the observed intervals. |
tol |
Numeric value whose negative exponential is used as convergence criterion for finding the nonparametric maximum likelihood estimate. Defaults to 7 meaning exp(-7). |
ml |
If TRUE (not the default!) use the usual Turnbull algorithm, else the product limit version of the self-consistent estimate. |
The response of formula
(ie the left hand side of the `~'
operator) specifies the model.
In two-state models – the classic survival case – the standard
Kaplan-Meier method is applied. For this the response can be
specified as a Surv
or as a Hist
object. Besides a
slight gain of computing efficiency, there are some extensions that
are not included in the current version of the survival package:
(0) The Kaplan-Meier estimator for the censoring times
reverse=TRUE
is correctly
estimated when there are ties between event and censoring
times.
(1) A conditional version of the Kaplan-Meier estimator for at most one continuous predictors using symmetrical nearest neighborhoods (Beran 1981, Stute 1984, Akritas 1994).
(2) For cluster-correlated data the right hand side of
formula
may identify a cluster
variable. In that case
Greenwood's variance formula is replaced by the formula of Ying & Wei
(1994).
(3) Competing risk models can be specified via
Hist
response objects in formula
.
The Aalen-Johannsen estimator is applied for estimating the cumulative
incidence functions for all causes.
The advantage over the function cuminc
of the cmprsk
package are user-friendly model specification via Hist
and sophisticated print, summary, predict and plot methods.
Under construction:
(U0) Interval censored event times specified via
Hist
are used to find the nonparametric maximum
likelihood estimate. Currently this works only for two-state models
and the results should match with those from the package `Icens'.
(U1) Extensions to more complex multi-states models
(U2) The nonparametric maximum likelihood estimate for interval censored observations of competing risks models.
Object of class "prodlim" for which there are print, predict, life.table, summary and plot methods.
Thomas A. Gerds tag@biostat.ku.dk
Andersen, Borgan, Gill, Keiding (1993) Springer `Statistical Models Based on Counting Processes'
Akritas (1994) The Annals of Statistics 22, 1299-1327 Nearest neighbor estimation of a bivariate distribution under random censoring.
R Beran (1981) http://anson.ucdavis.edu/~beran/paper.html `Nonparametric regression with randomly censored survival data'
Stute (1984) The Annals of Statistics 12, 917–926 `Asymptotic Normality of Nearest Neighbor Regression Function Estimates'
Ying, Wei (1994) Journal of Multivariate Analysis 50, 17-29 The Kaplan-Meier estimate for dependent failure time observations
predictSurv
,
predictSurvIndividual
,
predictCuminc
,
Hist
,
neighborhood
,
Surv
,
survfit
,
strata
,
cluster
,
NN
##---------------------two-state survival model------------ dat <- SimSurv(100) with(dat,plot(Hist(time,status))) fit <- prodlim(Hist(time,status)~1,data=dat) print(fit) plot(fit) summary(fit) ## --------------------clustered data--------------------- dat <- SimSurv(100,cova=list(patnr=list("rbinom",30,.5)),surv.baseline=1/10000000) fit <- prodlim(Hist(time,status)~cluster(patnr),data=dat) print(fit) plot(fit) summary(fit) ##-----------compare Kaplan-Meier to survival package--------- pfit <- prodlim(Surv(time,status)~1,data=pbc) pfit <- prodlim(Hist(time,status)~1,data=pbc) ## same thing sfit <- survfit(Surv(time,status)~1,data=pbc,conf.type="plain") ## same result for the survival distribution function all(round(pfit$surv,12)==round(sfit$surv,12)) summary(pfit,digits=3) summary(sfit,times=quantile(unique(pbc$time))) ##------------------censoring survival function---------------- rpfit <- prodlim(Hist(time,status)~1,data=pbc,reverse=TRUE) rsfit <- survfit(Surv(time,1-status)~1,data=pbc,conf.type="plain") ## not the same if there are ties! all(round(rpfit$surv,3)==round(rsfit$surv,3)) ##-------------------stratified Kaplan-Meier--------------------- pfit.edema <- prodlim(Surv(time,status)~edema,data=pbc) summary(pfit.edema) summary(pfit.edema,intervals=TRUE) plot(pfit.edema) ##----------continuous covariate: Stone-Beran estimate------------ prodlim(Surv(time,status)~age,data=pbc) ##-------------both discrete and continuous covariates------------ prodlim(Surv(time,status)~edema+age,data=pbc) ##----------------------interval censored data---------------------- dat <- data.frame(L=1:10,R=c(2,3,12,8,9,10,7,12,12,12),status=c(1,1,0,1,1,1,1,0,0,0)) with(dat,Hist(time=list(L,R),event=status)) npmle.fitml <- prodlim(Hist(time=list(L,R),event=1)~1,data=dat,ml=TRUE) ##-------------competing risks------------------- CompRiskFrame <- data.frame(time=1:100,event=rbinom(100,2,.5),X=rbinom(100,1,.5)) crFit <- prodlim(Hist(time,event)~X,data=CompRiskFrame) summary(crFit) plot(crFit) summary(crFit,cause=2) plot(crFit,cause=2)