el.cen.EM {emplik}R Documentation

Empirical likelihood ratio for mean with right, left or doubly censored data, by EM algorithm

Description

This program uses EM algorithm to compute the maximized (wrt p_i) empirical log likelihood function for right, left or doubly censored data with the MEAN constraint:

sum_{d_i=1} p_i f(x_i) = int f(t) dF(t) = μ .

Where p_i = Delta F(x_i) is a probability, d_i is the censoring indicator, 1(uncensored), 0(right censored), 2(left censored). It also returns those p_i.

The empirical log likelihood been maximized is

sum_{d_i=1} log Delta F(x_i) + sum_{d_i=0} log [1-F(x_i)] + sum_{d_i=2} log F(x_i) .

Usage

el.cen.EM(x,d,fun=function(t){t},mu,maxit=25,error=1e-9,...)

Arguments

x a vector containing the observed survival times.
d a vector containing the censoring indicators, 1-uncensored; 0-right censored; 2-left censored.
fun a continuous (weight) function used to calculate the mean as in H_0. fun(t) must be able to take a vector input t. Default to the identity function f(t)=t.
mu a real number used in the constraint, mean value of f(X).
maxit an optional integer, used to control maximum number of iterations.
error an optional positive real number specifying the tolerance of iteration error. This is the bound of the L_1 norm of the difference of two successive weights.
... additional arguments, if any, to pass to fun.

Details

This implementation is all in R and have several for-loops in it. A faster version would use C to do the for-loop part. But this version seems faster enough and is easier to port to Splus.

We return the log likelihood all the time. Sometimes, (for right censored and no censor case) we also return the -2 log likelihood ratio. In other cases, you have to plot a curve with many values of the parameter, mu, to find out where is the place the log likelihood becomes maximum. And from there you can get -2 log likelihood ratio between the maximum location and your current parameter in Ho.

In order to get a proper distribution as NPMLE, we automatically change the d for the largest observation to 1 (even if it is right censored), similar for the left censored, smallest observation. μ is a given constant. When the given constants μ is too far away from the NPMLE, there will be no distribution satisfy the constraint. In this case the computation will stop. The -2 Log empirical likelihood ratio should be infinite.

The constant mu must be inside ( min f(x_i) , max f(x_i) ) for the computation to continue. It is always true that the NPMLE values are feasible. So when the computation stops, try move the mu closer to the NPMLE —

sum_{d_i=1} p_i^0 f(x_i)

p_i^0 taken to be the jumps of the NPMLE of CDF. Or use a different fun.

Value

A list with the following components:

loglik the maximized empirical log likelihood under the constraint.
times locations of CDF that have positive mass.
prob the jump size of CDF at those locations.
"-2LLR" If available, it is Minus two times the Empirical Log Likelihood Ratio. Should be approx. chi-square distributed under Ho.
Pval The P-value of the test, using chi-square approximation.
lam The Lagrange multiplier. Added 5/2007.

Author(s)

Mai Zhou

References

Zhou, M. (2002). Computing censored empirical likelihood ratio by EM algorithm. JCGS

Murphy, S. and van der Varrt (1997) Semiparametric likelihood ratio inference. Ann. Statist. 25, 1471-1509.

Examples

## example with tied observations
x <- c(1, 1.5, 2, 3, 4, 5, 6, 5, 4, 1, 2, 4.5)
d <- c(1,   1, 0, 1, 0, 1, 1, 1, 1, 0, 0,   1)
el.cen.EM(x,d,mu=3.5)
## we should get "-2LLR" = 1.2466....
myfun5 <- function(x, theta, eps) {
u <- (x-theta)*sqrt(5)/eps 
INDE <- (u < sqrt(5)) & (u > -sqrt(5)) 
u[u >= sqrt(5)] <- 0 
u[u <= -sqrt(5)] <- 1 
y <- 0.5 - (u - (u)^3/15)*3/(4*sqrt(5)) 
u[ INDE ] <- y[ INDE ] 
return(u)
}
el.cen.EM(x, d, fun=myfun5, mu=0.5, theta=3.5, eps=0.1)

[Package emplik version 0.9-5 Index]