cenfit {NADA} | R Documentation |
Computes an estimate of an empirical cumulative distribution function (ECDF) for censored data using the Kaplan-Meier method.
cenfit(formula, na.action, se.fit=T, conf.int=.95, conf.type=c("log","log-log","plain","none"), conf.lower=c("usual", "peto", "modified"))
formula |
The formula must have a Cen object as the response on the
left of the ~ operator and, if desired, terms separated by +
operators on the right. One of the terms may be a strata
object. For a single survival curve the "~ 1" part of the
formula is not required.
|
na.action |
a missing-data filter function, applied to the model frame.
Default is options()$na.action .
|
se.fit |
a logical value indicating whether standard errors should be computed.
Default is TRUE .
|
conf.int |
the level for a two-sided confidence interval on the survival curve(s). Default is 0.95. |
conf.type |
One of "none" , "plain" , "log" (the default),
or "log-log" . Only enough of the string to uniquely identify
it is necessary. The first option causes confidence intervals not to
be generated. The second causes the standard intervals curve
+- k *se(curve) , where k is determined from conf.int .
The log option calculates intervals based on the cumulative hazard
or log(probability). The last option bases intervals on the log hazard
or log(-log(probability)). These last will never extend past 0 or 1.
|
conf.lower |
controls modified lower limits to the curve, the upper limit remains
unchanged. The modified lower limit is based on an 'effective n'
argument. The confidence bands will agree with the usual calculation
at each detected value, but unlike the usual bands the
confidence interval becomes wider at each censored observation.
The extra width is obtained by multiplying the usual variance by
a factor m/n, where n is the number currently at risk and m is the
number at risk at the last censored observation. (The bands thus
agree with the un-modified bands at each detected observation.)
This is especially useful for ECDF curves with a long flat tail.
The Peto lower limit is based on the same 'effective n' argument as the modified limit, but also replaces the usual Greenwood variance term with a simple approximation. It is known to be conservative. |
This, and related routines, are front ends to routines in the
survivial
package. Since the survival routines can not handle
left-censored data, these routines tranparently handle ``flipping" input
data and resultant calculations. Additionally provided are query and
prediction methods for cenfit
objects (see below).
The ECDF estimates used are the Kalbfleisch-Prentice variety (Kalbfleisch and Prentice, 1980, p.86), which reduces to the Kaplan-Meier, when weights are unity.
The Greenwood formula for the variance is used to calculate errors. The Greenwood formula uses a sum of terms d/(n*(n-m)), where d is the number of detections at a given concentration, n is the sum of 'weights' for all individuals at or below that concentration,, and m is the sum of 'weights' for the detected obs at that time. The justification is based on a binomial argument when weights are all equal to one; extension to the weighted case is ad hoc. Tsiatis (1981) proposes a sum of terms d/(n*n), based on a counting process argument which includes the weighted case.
a cenfit
object.
Methods defined for cenfit
objects are provided for
print
, plot
, lines
, predict
,
mean
, median
, sd
, quantile
.
If the input formula contained factoring groups
(ie., cenfit(Cen(obs, censored)~groups))
, individual ECDFs can be
obtained by indexing (eg., model[1]
, etc.).
Lopaka(Rob) Lee <rclee@usgs.gov>
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Dorey, F. J. and Korn, E. L. (1987). Effective sample sizes for confidence intervals for survival probabilities. Statistics in Medicine 6, 679-87.
Fleming, T. H. and Harrington, D.P. (1984). Nonparametric estimation of the survival distribution in censored data. Comm. in Statistics 13, 2469-86.
Kalbfleisch, J. D. and Prentice, R. L. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York.
Link, C. L. (1984). Confidence intervals for the survival function using Cox's proportional hazards model with covariates. Biometrics 40, 601-610.
Tsiatis, A. (1981). A large sample study of the estimate for the integrated hazard function in Cox's regression model for survival data. Annals of Statistics 9, 93-108.
Cen
,
predict.cenfit
,
quantile.cenfit
,
median.cenfit
,
mean.cenfit
,
sd.cenfit
,
cendiff
,
survfit
# fit a Kaplan-Meier ECDF, plot and summarize it. obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) mycenfit = cenfit(Cen(obs, censored)) plot(mycenfit) summary(mycenfit)