drm {drm} | R Documentation |
drm
fits a combined regression and association model for longitudinal
or otherwise clustered categorical responses using dependence ratio as
a measure of the association.
drm(formula, family=binomial, data=sys.parent(), weights, offset, subset=NULL, na.action, start=NULL, link="cum", dep="I", Ncond=TRUE, Lclass=2, dropout=FALSE, drop.x=NULL, save.profiles=TRUE, pmatrix=NULL, print.level=2, iterlim=200, ...)
formula |
a formula expression as for other regression models.
In addition the cluster term has to be specified in the
expression by cluster() and if using temporal association
structure the temporal term has to be specified by Time() .
See examples below and the documentation of lm and formula
for further details. |
family |
a description of the link function to be used in the
model for a binary response. Default is logit link.
See family for details. For an ordinal response, link
is defined for the cumulative probabilities when
link -argument is set to "cum". See link below. |
data |
an optional data frame containing the variables in the model. |
weights |
an optional vector of weights to be used in the fitting process. Only equal weights within cluster are allowed. |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when
the data contain NAs. The default is na.include after
which the analysis assumes missing data mechanism at random (MAR) if
dropout=FALSE , and not at random if
dropout = TRUE . See dropout below. |
start |
an optional vector of starting values for the
parameters. By default, the starting values
are estimated from glm -procedure assuming independence |
link |
this can be used to specify alternative link functions
for nominal and ordinal responses. By default "cum", after which
the link is specified through family = binomial(link=?) for the
cumulative probabilities. Alternative links include adjacent
category logit "acl" and baseline category logit "bcl" (baseline
category being the last category). For "bcl", the regression
parameters are estimated for each logit level. For a binary response,
this argument is ignored. |
dep |
dep defines the association structure. The default is
independence "I". Other singular options are for the
exchangeable association: Necessary factor "N", Latent
categorical factor "L", Latent Beta-distributed propensity "B"
(binary response), Latent Dirichlet-distributed propensities "D"
(multicategorical response), and for the
temporal association: first order Markov "M", and second order
Markov "M2" (binary response). By default,
Markov structure for the adjacent 2-way dependence ratios is
assumed to be stationary. Superpositions of these structures
can be imposed, such as "NL", "NB","ND", "NM", "LM", "NLM","NM2".
See [3-7] for further details.
Parameter restrictions, covariates and functional forms for the
association parameters can also be specified. In that case
the dep -argument must be a list. See examples below. For
the interpretation of the association parameters, see the
documentation of the support function getass.drm . |
Ncond |
logical argument defining whether the regression model is marginal or
conditional when the association is "N". The default is TRUE ,
i.e. the regression estimation is conditional on {N=1}. If
covariates are used for the "N"-association, it is advisable to set
Ncond=FALSE , since otherwise the interpretation of the
regression parameters is less clear. |
Lclass |
Number of latent classes in the population when the association is
"L". Default is 2. Available only for binary response. Note that in the
current implementation, the conditional probabilities are not
calculated for Lclass >2. For checking the validity of the
model, the user needs to check whether the estimated conditional
probabilities fall within 0 and 1. See example in getass.drm for
parameter interpretation and how to calculate the conditional probabilities |
dropout |
logical argument. For monotone missing patterns in longitudinal studies,
this argument allows to impose a selection model (see [8] for
details) on top of regression and association model to investigate the sensitivity of
the results due to missingness. The model formula notation is:
logit(hz(drop.cur)) = (Intercept)d+response.cur+response.prev ,
where response.cur denotes the effect of current, possibly missing
response value and response.prev denotes the effect of previous
response value. MCAR, MAR and MNAR-models can be
specified by imposing restrictions on selection model parameters
in dep -argument as for the association parameters. See
dep above and examples below. If the response is a factor, the effect of
the factor levels are estimated contrasting to the lowest level. |
drop.x |
an optional covariate vector for the selection model.
The covariate's previous value (notation: covariate.prev ) is used in
the selection model. |
save.profiles |
logical argument defining whether the fitted
values of all possible profiles is saved. If FALSE , only the
indicator vector (-1 for a negative, 1 for a positive profile) over all
units will be saved. If the cluster size is large, using
save.profiles=TRUE may result in a very large object. |
pmatrix |
a character object specifying the name of the matrix
for all possible profiles, created using
profiles.drm . If the cluster size is large,
this speeds up the estimation in case several models are fitted.
See examples below. |
print.level |
level of printing during numerical optimisation.
The default is 2. See nlm for further details. |
iterlim |
maximum iteration limit for the numerical maxisimisation.
See nlm for further details. |
... |
other arguments passed to nlm , e.g. controlling the
convergence. See nlm for further details. |
drm
gives maximum likelihood estimates for the combined
regression and association model by decomposing a
joint probability of responses in a cluster to univariate
marginal or cumulative probabilities and dependence ratios of all
orders. See [1] and [5] for further details. The dimensionality
of the association part is reduced by imposing a model for
the association structure with dep
-argument. See
getass.drm
and [3-7] for details. Furthermore, a
selection model can be added on top of regression and association
model. See examples below and [5] and [8] for details.
drm
returns an object of class drm
.
The function summary
(i.e., summary.drm
) can be
used to obtain or print a summary of the results. The generic
accessor function coefficients
can be used to extract
coefficients.
An object of class drm
is a list containing at least the
following components:
coefficients |
a named vector of regression, and possibly association and selection model coefficients. |
cov.scaled |
a variance-covariance matrix of the parameter estimates. |
fitted.marginals |
the fitted values for the univariate means, obtained by transforming the linear predictors by the inverse of the link function. |
fitted.conditionals |
in case of "L"-structure, the fitted
values for the conditional univariate means, otherwise NULL. Not
yet implemented for Lclass >2; see also
getass.drm . |
fitted.profiles |
the fitted response profile probabilities
within each cluster, calculated by using the maximum likelihood
estimates from the model. See also save.profiles
above. Note that within each cluster, the order of the
responses is by Time for Markov structures, and for
exchangeable structures with missing values, by response
value, with missing values (NA) last. |
deviance |
minus twice the maximised log-likelihood. |
aic |
An Information Criterion: minus twice the maximised log-likelihood plus twice the number of coefficients. Not available if the likelihood is weighted with the dropout probabilities. |
niter |
the number of iterations that nlm used. |
code |
convergence code from nlm . See nlm for details. |
call |
the matched call. |
terms |
the `terms' object used. |
The maximum likelihood estimates may sometimes lead to negative fitted probabilities. In this case, both generic print-methods warn about this. In this case, the model is considered to be wrongly specified and model specification should be changed.
Jukka Jokinen, jukka.jokinen@helsinki.fi
1. Ekholm A, Smith PWF, McDonald JW. Marginal regression analysis of a multivariate binary response. Biometrika 1995; 82(4):847-854.
2. Ekholm A, Skinner C. The Muscatine children's obesity data reanalysed using pattern mixture models. Applied Statistics 1998; 47:251-263.
3. Ekholm A, McDonald JW, Smith PWF. Association models for a multivariate binary response. Biometrics 2000; 56:712-718.
4. Ekholm A, Jokinen J, Kilpi T. Combining regression and association modelling on longitudinal data on bacterial carriage. Statistics in Medicine 2002; 21:773-791.
5. Ekholm A, Jokinen J, McDonald JW, Smith PWF. Joint regression and association modelling of longitudinal ordinal data. Biometrics 2003; 59:795-803.
6. Jokinen J, McDonald JW, Smith PWF. Meaningful regression and association models for clustered ordinal data. Sociological Methodology 2006; 36:173-199.
7. Jokinen J. Fast estimation algorithm for likelihood-based analysis of repeated categorical responses. Computational Statistics and Data Analysis 2006; 51:1509-1522.
8. Diggle PJ, Kenward MJ. Informative dropout in longitudinal data analysis. Applied Statistics 1994; 43: 49-94.
getass.drm
, nlm
,
cluster
, Time
profiles.drm
, depratio
###################################################### ## Examples for binary responses ########################################### ## Wheeze among Steubenville (see [3]): ## Latent Beta-distributed propensity data(wheeze) fit1 <- drm(wheeze~I(age>9)+smoking+cluster(id),data=wheeze,dep="B", print=0) ## Obesity among Muscatine children (see [2]): ## Analysis for completers: M2 for girls data(obese) fit2 <- drm(obese~age+cluster(id)+Time(year), subset=sex=="female", dep="M2",data=obese) ## Not run: ## Muscatine children continued (see [3]): ## LM for boys and girls separately fit3 <- drm(obese~age+cluster(id)+Time(age), subset=sex=="male", dep="LM",data=obese) fit4 <- drm(obese~age+cluster(id)+Time(age), subset=sex=="female", dep="LM",data=obese) ## End(Not run) ############################################ ## Examples for ordinal responses ############################################ ## Movie critic example (see [6]): ## Latent Dirichlet propensities with baseline category link. data(movie) options(contrasts=c("contr.treatment","contr.treatment")) fit5 <- drm(y~critic+cluster(movie), data=movie, dep="D", link="bcl") ## Longitudinal dataset on teenage marijuana use (see [6]): ## Superposition of structures N, L and M for the girls. data(marijuana) fit6 <- drm(y~age+cluster(id)+Time(age), data=marijuana, subset=sex=="female", dep=list("NLM", ~kappa1==1, ~kappa2==0, ~tau12==1, ~tau21==1, ~tau11==tau22)) ## Parameter restrictions with functions using M-structure for the boys. ## Plot the second order dependence ratios: plot(depratio(y~cluster(id)+Time(age), data=marijuana, subset=sex=="male")) ## fit the model in [6]: fit7 <- drm(y~age+cluster(id)+Time(age), data=marijuana, subset=sex=="male", dep=list("M", tau12~function(a=1,b=0) a+b*c(0:3), tau21~function(a=1,b=0) a+b*c(0:3))) ## Not run: ############################################## ## Covariates for the association (see [7]): ############################################## data(madras) ## plot empirical 2nd order dependence ratios with bootstrap CI's tau.madras <- depratio(symptom~cluster(id)+Time(month), data=madras, boot.ci = TRUE, n.boot = 1000) plot(tau.madras, log="y", ylim=c(1,40), plot.ci=TRUE) ## create matrix for profiles: W.madras <- profiles.drm(n.categories=2, n.repetitions=12, "M") ## create four-level covariate, combining age and sex: madras$age.sex <- factor(paste(madras$age,madras$sex,sep=".")) ## fit the model in [7], Section 4: fit8 <- drm(symptom~age+sex+month+month:age+month:sex+cluster(id)+Time(month), data=madras, Ncond=FALSE, save.profiles=FALSE, pmatrix="W.madras", dep=list("NM",nu~nu:age.sex, tau~function(a0=0,a1=0) 1+a0*exp(a1*c(0:10))), print=2) ################################################### ## Dropout model on top of regression & association ################################################### ## Continue with the madras data. ## fit a model without the dropout model: fit9 <- drm(symptom~age+sex+month+month:age+month:sex+cluster(id)+Time(month), data=madras, save.profiles=FALSE, pmatrix="W.madras", print=0, dep=list("NM", tau~function(a0=0,a1=0) 1+a0*exp(a1*c(0:10)))) ## A dropout model assuming MCAR for the thought disorders: mcar <- drm(symptom~age+sex+month+month:age+month:sex+cluster(id)+Time(month), data=madras, save.profiles=FALSE, pmatrix="W.madras", dep=list("NM", tau~function(a0=0,a1=0) 1+a0*exp(a1*c(0:10)), ~symptom.cur==0,~symptom.prev==0), dropout=TRUE, start=c(coef(fit9), -4)) ## A dropout model assuming MAR; including sex as a covariate: mar <- drm(symptom~age+sex+month+month:age+month:sex+cluster(id)+Time(month), data=madras, save.profiles=FALSE, pmatrix="W.madras", dep=list("NM", tau~function(a0=0,a1=0) 1+a0*exp(a1*c(0:10)), ~symptom.cur==0), dropout=TRUE, drop.x=sex, start=c(coef(mcar),0,0)) ## A dropout model assuming MNAR and sex as a covariate: mnar <- drm(symptom~age+sex+month+month:age+month:sex+cluster(id)+Time(month), data=madras, save.profiles=FALSE, pmatrix="W.madras", dep=list("NM", tau~function(a0=0,a1=0) 1+a0*exp(a1*c(0:10))), dropout=TRUE, drop.x=sex, start=c(coef(mcar),0,0,0)) ## print out coefficients and std.errors: coef(summary(mnar)) ## End(Not run) ## std.error of `symptom.cur' all over the place; too few dropouts ## for a comprehensive evaluation of the dropout mechanism