nested.coxph {NestedCohort} | R Documentation |
nested.coxph
fits the Cox model to estimate hazard ratios
for covariates that are missing data on some cohort members. All
covariates may be continous or categorical.
nested.coxph
requires knowledge of the variables that
missingness depends on, with missingness probability modeled through a
glm
sampling model. Often, the data is in the form of a
case-control sample taken within a cohort. nested.coxph
allows
cases to have missing data, and can extract efficiency from auxiliary
variables by including them in the sampling model. nested.coxph
requires coxph
from the survival package.
nested.coxph(coxformula, samplingmod, data, outputsamplingmod = FALSE, glmlink = binomial(link = "logit"), glmcontrol = glm.control(epsilon = 1e-10, maxit = 10, trace = FALSE), coxphcontrol = coxph.control(eps = 1e-10, iter.max = 50), missvarwarn = TRUE, ...)
Required arguments:
coxformula |
Standard coxph formula |
samplingmod |
Right side of the formula for the glm
sampling model that models the probability of missingness |
data |
Data Frame that all variables are in |
outputsamplingmod |
Output the sampling model, default is false |
glmlink |
Sampling model link function, default is logistic regression |
glmcontrol |
See glm.control |
coxphcontrol |
See coxph.control |
missvarwarn |
|
... |
Any additional arguments to be passed on to glm
or coxph |
If nested.coxph
reports that the sampling model "failed to converge",
the sampling model will be returned for your inspection. Note that if
some sampling probabilities are estimated at 1, the model technically
cannot converge, but you get very close to 1, and nested.coxph
will not report non-convergence for this situation.
If outputsamplingmod=F, the output are the hazard ratios and the
coxph
model. Any
method for coxph
objects will work for this so long
as that method only requires consistent estimates of the parameters and
their standard errors.\
If outputsamplingmod=T, then the sampling model is also returned, and
the output is a list with components:
coxmod |
The Cox model of class coxph |
samplingmod |
The sampling model of class glm |
Requires the MASS library from the VR bundle that is available from the CRAN website.
Hormuzd A. Katki
Mark, S.D. and Katki, H.A. Specifying and Implementing Nonparametric and Semiparametric Survival Estimators in Two-Stage (sampled) Cohort Studies with Missing Case Data. Journal of the American Statistical Association, 2006, 101, 460-471.
Mark SD, Katki H. Influence function based variance estimation and missing data issues in case-cohort studies. Lifetime Data Analysis, 2001; 7; 329-342
Christian C. Abnet, Barry Lai, You-Lin Qiao, Stefan Vogt, Xian-Mao Luo, Philip R. Taylor, Zhi-Wei Dong, Steven D. Mark, Sanford M. Dawsey. Zinc concentration in esophageal biopsies measured by X-ray fluorescence and cancer risk. To Appear in Journal of the National Cancer Institute.
See Also: nested.stdsurv
, zinc
,
nested.km
, coxph
, glm
## Simple analysis of zinc and esophageal cancer data: ## We sampled zinc (variable zncent) on a fraction of the subjects, with ## sampling fractions depending on cancer status and baseline histology. ## We observed the confounding variables on almost all subjects. data(zinc) coxmod <- nested.coxph(coxformula="Surv(futime01,ec01==1)~ sex+agepill+smoke+drink+mildysp+moddysp+sevdysp+anyhist+zncent", samplingmod="ec01*basehist",data=zinc) summary(coxmod) # This is the output: # Call: # coxph(formula = as.formula(coxformula), data = data, weights = 1/p.i.h.a.t., # na.action = na.omit, control = coxphcontrol, method = "breslow", # x = T) # n=123 (308 observations deleted due to missing) # coef exp(coef) se(coef) z p # sexMale 0.2953 1.344 0.5558 0.5313 6.0e-01 # agepill 0.0539 1.055 0.0275 1.9612 5.0e-02 # smokeEver 0.0145 1.015 0.5870 0.0248 9.8e-01 # drinkEver -0.8548 0.425 0.5896 -1.4497 1.5e-01 # mildyspMild Dysplasia 0.9023 2.465 0.3937 2.2921 2.2e-02 # moddyspModerate Dysplasia 1.3309 3.784 0.4212 3.1600 1.6e-03 # sevdyspSevere Dysplasia 2.1334 8.444 0.4615 4.6224 3.8e-06 # anyhistFamily History 0.0904 1.095 0.3896 0.2321 8.2e-01 # zncent -0.2498 0.779 0.1351 -1.8487 6.4e-02 # exp(coef) exp(-coef) lower .95 upper .95 # sexMale 1.344 0.744 0.452 3.99 # agepill 1.055 0.948 1.000 1.11 # smokeEver 1.015 0.986 0.321 3.21 # drinkEver 0.425 2.351 0.134 1.35 # mildyspMild Dysplasia 2.465 0.406 1.140 5.33 # moddyspModerate Dysplasia 3.784 0.264 1.658 8.64 # sevdyspSevere Dysplasia 8.444 0.118 3.417 20.86 # anyhistFamily History 1.095 0.914 0.510 2.35 # zncent 0.779 1.284 0.598 1.02 # Rsquare= NA (max possible= NA ) # Likelihood ratio test= NA on 9 df, p=NA # Wald test = 66.5 on 9 df, p=7.32e-11 # Score (logrank) test = NA on 9 df, p=NA