el2.cen.EMm {emplik2} | R Documentation |
This function uses the EM algorithm to calculate a maximized empirical likelihood ratio for a set of p hypotheses as follows:
H_o: E(g(x,y)-mean)=0
where E indicates expected value; g(x,y) is a vector of user-defined functions g_1(x,y), ..., g_p(x,y); and mean is a vector of p hypothesized values of E(g(x,y)). The two samples x and y are assumed independent. They may be uncensored, right-censored, left-censored, or left-and-right (``doubly'') censored. A p-value for H_o is also calculated, based on the assumption that -2*log(empirical likelihood ratio) is asymptotically distributed as chisq(p).
el2.cen.EMm(x, dx, y, dy, p, H, xc=1:length(x), yc=1:length(y), mean, maxit=10)
x |
a vector of the data for the first sample |
dx |
a vector of the censoring indicators for x: 0=right-censored, 1=uncensored, 2=left-censored |
y |
a vector of the data for the second sample |
dy |
a vector of the censoring indicators for y: 0=right-censored, 1=uncensored, 2=left-censored |
p |
the number of hypotheses |
H |
a matrix defined as H = [H_1, H_2, ..., H_p], where H_k = [g_k(x_i,y_j)-mu_k], k=1, ..., p |
xc |
a vector containing the indices of the x datapoints |
yc |
a vector containing the indices of the y datapoints |
mean |
the hypothesized value of E(g(x,y)) |
maxit |
a positive integer used to control the maximum number of iterations of the EM algorithm; default is 10 |
The value of mean_k should be chosen between the maximum and minimum values of g_k(x_i,y_j); otherwise there may be no distributions for x and y that will satisfy H_o. If mean_k is inside this interval, but the convergence is still not satisfactory, then the value of mean_k should be moved closer to the NPMLE for E(g_k(x,y)). (The NPMLE itself should always be a feasible value for mean_k.)
el2.cen.EMm
returns a list of values as follows:
xd1 |
a vector of the unique, uncensored x-values in ascending order |
yd1 |
a vector of the unique, uncensored y-values in ascending order |
temp3 |
a list of values returned by the el2.test.wtm function (which is called by el2.cen.EMm ) |
mean |
the hypothesized value of E(g(x,y)) |
NPMLE |
the non-parametric-maximum-likelihood-estimator vector of E(g(x,y)) |
logel00 |
the log of the unconstrained empirical likelihood |
logel |
the log of the constrained empirical likelihood |
"-2LLR" |
-2*(log-likelihood-ratio) for the p simultaneous hypotheses |
Pval |
the p-value for the p simultaneous hypotheses, equal to 1 - pchisq(-2LLR, df = p) |
logvec |
the vector of successive values of logel computed by the EM algorithm
(should converge toward a fixed value, can be used to assess convergence of the EM algorithm) |
sum_muvec |
sum of the probability jumps for the uncensored x-values, should be 1 |
sum_nuvec |
sum of the probability jumps for the uncensored y-values, should be 1 |
William H. Barton <bbarton@lexmark.com>
Barton, W. (2009). PhD dissertation at University of Kentucky, estimated completion Dec. 2009.
Chang, M. and Yang, G. (1987). ``Strong Consistency of a Nonparametric Estimator of the Survival Function with Doubly Censored Data.'' Ann. Stat.
,15, pp. 1536-1547.
Dempster, A., Laird, N., and Rubin, D. (1977). ``Maximum Likelihood from Incomplete Data via the EM Algorithm.'' J. Roy. Statist. Soc.
, Series B, 39, pp.1-38.
Gomez, G., Julia, O., and Utzet, F. (1992). ``Survival Analysis for Left-Censored Data.'' In Klein, J. and Goel, P. (ed.),
Survival Analysis: State of the Art.
Kluwer Academic Publishers, Boston, pp. 269-288.
Li, G. (1995). ``Nonparametric Likelihood Ratio Estimation of Probabilities for Truncated Data.''
J. Amer. Statist. Assoc.
, 90, pp. 997-1003.
Owen, A.B. (2001). Empirical Likelihood
. Chapman and Hall/CRC, Boca Raton, pp.223-227.
Turnbull, B. (1976). ``The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data.''
J. Roy. Statist. Soc.
, Series B, 38, pp. 290-295.
Zhou, M. (2005). ``Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm.''
J. Comput. Graph. Stat.
, 14, pp. 643-656.
Zhou, M. (2009) emplik
package on CRAN website. Dr. Zhou is my PhD advisor
at the University of Kentucky. My el2.cen.EMm
function extends Dr. Zhou's el.cen.EM2
function from one-sample to two-samples.
x<-c(10, 80, 209, 273, 279, 324, 391, 415, 566, 85, 852, 881, 895, 954, 1101, 1133, 1337, 1393, 1408, 1444, 1513, 1585, 1669, 1823, 1941) dx<-c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0) y<-c(21, 38, 39, 51, 77, 185, 240, 289, 524, 610, 612, 677, 798, 881, 899, 946, 1010, 1074, 1147, 1154, 1199, 1269, 1329, 1484, 1493, 1559, 1602, 1684, 1900, 1952) dy<-c(1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0) nx<-length(x) ny<-length(y) xc<-1:nx yc<-1:ny wx<-rep(1,nx) wy<-rep(1,ny) maxit<-10 mean=c(0.5,0.5) p<-2 H1<-matrix(NA,nrow=nx,ncol=ny) H2<-matrix(NA,nrow=nx,ncol=ny) for (i in 1:nx) { for (j in 1:ny) { H1[i,j]<-(x[i]>y[j]) H2[i,j]<-(x[i]>1060) } } H=matrix(c(H1,H2),nrow=nx,ncol=p*ny) # Ho1: X is stochastically equal to Y # Ho2: mean of X equals mean of Y el2.cen.EMm(x, dx, y, dy, p, H, xc=1:length(x), yc=1:length(y), mean, maxit=10) # Result: Pval is 0.6310234, so we cannot with 95 percent confidence reject the two # simultaneous hypotheses Ho1 and Ho2