d011 {dblcens}R Documentation

Compute NPMLE of CDF from doubly censored data

Description

d011 computes the NPMLE of CDF from doubly censored data via EM algorithm starting from an initial estimator that have jumps at (1) uncensored points; (2) (mid-point of) consecutive survival times with censoring indicator pattern of (0,2), (see below for definition)

When there are ties, the left (right) censored points are treated as happened before (after), to break tie. Also when the last obs. happens to be right censored observation and when the first obs. happens to be left censored observations are changed to uncensored, in order to obtain a proper distribution as the CDF estimator. (though this can be modified easily as they are written in R language).

It also computes the NPMLE of the two censoring distributions. There is an option that you may also try to compute the three influence functions.

Usage

d011(z, d, identical = rep(0, length(z)),
         maxiter = 49, error = 0.00001, influence.fun = FALSE)

Arguments

z a vector of length n denoting observed times, (ties permitted)
d a vector of length n that contains censoring indicator: d= 2 or 1 or 0, (according to z being left, not, right censored)
identical optional. A vector of length n that has values either 0 or 1. identical[i]=1 means even if $(z[i],d[i])$ is identical with $(z[j],d[j])$, for some $j not= i$, they still stay as 2 observations, (not 1 obs. with weight 2, which only happen if identical[i]=0 and identical[j] =0). One reason for this is because they may have different covariates not shown here. This adds more flexibility for regression applications. Default value is identical = 0, (i.e. collapse if identical observations).
maxiter optional integer value. default to 49
error optional. Default to 0.00001
influence.fun optional. Default to FALSE. If TRUE, the code will try to compute the influence functions (3 of them) at the censored times. This computation can be very slow and memory intensive (for data with >500 censored times).

Value

a list contain the NPMLE of CDF and other informations.

time Times of input z, with time corresponding to status=2 removed.
status Censoring status of the above times. Status = -1 means this is an added time because of the censoring pattern (0,2).
surv Survival probability at the above times.
jump Jumps of the NPMLE at the above times.
exttime Similar to times but those with status =2 not removed.
extstatus status of exttime
extjump jump pf NPMLE at exttime.
extsurv.Sx Estimated lifetime distribution.
surv0.Sy One of the censoring distributions.
jump0 Jump of surv0.Sy
surv2.Sz Another censoring distribution.
jump2 Jump of surv2.Sz
conv A vector of length 2: the actual number of iterations, and the actual error of successive iteration. If the iteration number equal to the maxiter you set, then the iteration has not converged.
Nodes Points where the influence function is computed.
IC1tu Influence function value at the nodes. See Chang (1990) for details.
IC1tu2 Influence function value at other points. See Chang (1990) for details.
IC2tu ditto IC1tu
IC3tu ditto IC1tu
VarFt Estimated variances of F(t) at the Nodes.

Author(s)

Mai Zhou mai@ms.uky.edu, Li Lee.

References

Chang, M. N. and Yang, G. L. (1987). Strong consistency of a nonparametric estimator of the survival function with doubly censored data. Ann. Statist. 15, 1536-1547.

Turnbull (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. JRSS B 290-295.

Chang, M. N. (1990). Weak convergence in doubly censored data. Ann. Statist. 18, 390-405.

Chen, K. and Zhou, M. (2000). Nonparametric Hypothesis Testing and Confidence Intervals with Doubly Censored Data. Tech. Report, Univ. of Kentucky. See also Lifetime Data Analysis, {bf 9}, (2003).

Examples

d011(z=c(1,2,3,4,5), d=c(1,0,2,2,1))
#
# you should get something like below (and more)
#
#       $time:
#       [1] 1.0 2.0 2.5 5.0    (notice the times, (3,4), corresponding
#                                   to d=2 are removed, and time 2.5 added
#       $status:               since there is a (0,2) pattern at
#       [1]  1  0 -1  1        times 2, 3. The status indicator of -1
#                                   show that it is an added time )
#       $surv
#       [1] 0.5000351 0.5000351 0.3333177 0.0000000
#
#       $jump
#       [1] 0.4999649 0.0000000 0.1667174 0.3333177
#
#       $exttime
#       [1] 1.0 2.0 2.5 3.0 4.0 5.0
#
#       $extstatus
#       [1]  1  0 -1  2  2  1
#
#       ...... 
#
#       $conv
#       [1] 3.300000e+01  8.788214e-06  ### did 33 iterations
#
# BTW, the true NPMLE of surv is (1/2, 1/2, 1/3, 0) at times (1,2,2.5,5).
###### Example 2. 
d011(c(1,2,3,4,5), c(1,2,1,0,1),influence.fun=TRUE)
#     we get
# ......
#$conv:
#[1] 3 0
#
#$Nodes:
#[1] 2 4
#
#$IC1tu:
#     [,1] [,2]
#[1,]   -1    0
#[2,]   -1   -2
#
#$IC2tu:
#           [,1] [,2]
#[1,]  0.0000000    0
#[2,] -0.3333333    0
#
#$IC3tu:
#     [,1]       [,2]
#[1,]   -1 -0.6666667
#[2,]   -1 -1.0000000
#
#$VarFt:
#[1] 0.24 0.24           ## est var of F(t) at t=nodes
#######################################################

[Package dblcens version 1.1.4 Index]