luca {luca} | R Documentation |
In genetic association studies, there is increasing interest in understanding the joint effects of genetic and nongenetic factors. For rare diseases, the case-control study is the standard design and logistic regression is the standard method of inference. However, the power to detect statistical interaction is a concern, even with relatively large samples. LUCA implements maximum likelihood inference under
Maximum likelihood under covariate assumptions offers improved precision of interaction estimators compared to the standard logistic regression approach which makes no assumptions on the distribution of covariates.
luca(pen.model, gLabel, dat, HWP = FALSE, dep.model = NULL)
pen.model |
an R formula specifying the disease penetrance model
relating a genetic factor and a number of nongenetic attributes (the
predictors or transformations thereof) to disease status. A typical
pen.model
has the form d ~ g + a + g:a where d is a binary disease
response, g is a genetic factor, a is a
(possibly continuous) nongenetic factor and g:a is the interaction
between the genetic and nongenetic factors. |
gLabel |
a character string specifying the name of the genetic factor in pen.model . |
dat |
a data frame containing the variables in pen.model ,
currently, with no default value. Each row of dat is
considered as one multivariate observation for a subject. Note that the
genetic term must be a factor object, and also needs to
be a genotype object in some cases (as described
in the following arguments). Currently, the disease response variable must
be numeric with values 0 (unaffected) and 1 (affected).
Also, note that missing values are not allowed in the data frame. |
HWP |
a logical value indicating whether the genotype frequencies
in controls should be assumed to follow Hardy-Weinberg proportions.
When TRUE , the genetic term must be a genotype
object. |
dep.model |
an R formula specifying the dependence between the
genetic factor and nongenetic attributes. (See the Details section below for
more on the dependence model.) When NULL (default),
it indicates independence between the genetic factor and nongenetic
attributes in controls. The argument HWP is ignored for a
non-null dep.model . The genetic factor must be a
genotype object when
dep.model is provided. |
Inference for association parameters is obtained by fitting a
conditional logistic regression model
with appropriate match-sets comprised of
“pseudo-individuals” having all possible values of the genetic
factor and disease status but common value of the nongenetic attribute.
The function coxph.fit
from the survival
package is used to fit the conditional logistic regression.
A dependence model such as g ~ a
specifies a polychotomous
regression model for g
as a function of a
.
Typically a
is also a term in the penetrance model.
The polychotomous regression for the genetic factor
g
given the attribute a
holds
when the conditional distribution of a
given g
is from the exponential family of distributions, with a constant
dispersion parameter across the levels of the genetic factor.
To model conditional independence of a genetic factor g
and a nongenetic attribute a
given a third variable a2
,
specify the dependence model g ~ a2
.
See Shin, McNeney and Graham (2007) for details.
luca
also allows dependence models of the form
g ~ a1 + a2 + ...
for multiple attributes a1
, a2
, ...
However, there is no
formal justification for the use of such a model to capture
the dependence between g
and multiple nongenetic attributes.
An object of class "luca"
with the following components:
call |
the function call |
coefficients |
estimates of parameters in the
covariate model (lebelled as covmod.XX ) and the penetrance model
(labelled as penmod.YY where YY denotes the name of a term
in the model).
The covariate model parameters depend on the covariate assumptions and are
1) control-population log-odds for each level of the genetic
factor relative to a baseline level under independence,
2) control-population log-odds for each allele relative to a baseline allele
under independence plus HWP, or
3) the parameters from the polychotomous regression model under
dependence (see the Details section for
a description of this model).
|
var |
the variance-covariance matrix of the parameter estimates. |
iter |
number of iterations in the iterative search for parameter estimates |
The function summary.luca
(or summary
) can be used to obtain a summary of the results in a similar style to the lm
and glm
summaries.
Inference is not robust to misspecification
of the covariate assumptions. There should be strong a priori evidence
to support any assumptions that are made. Alternately, luca
may be used
to screen for “interesting” interactions that are followed up
with logistic regression using data from a larger study.
Ji-Hyung Shin, Brad McNeney, Jinko Graham
Shin J-H, McNeney B, Graham J (2007). Case-Control Inference of Interaction between Genetic and Nongenetic Risk Factors under Assumptions on Their Distribution. Statistical Applications in Genetics and Molecular Biology 6(1), Article 13. Available at: http://www.bepress.com/sagmb/vol6/iss1/art13.
summary.luca
, glm
, coxph
, clogit
data(lucaDat) pen.model<-formula(d~I(allele.count(g,"C"))+a+a2+I(allele.count(g,"C")):a2) #1. Assuming independence and HWP fitHWP<-luca(pen.model=pen.model, gLabel="g", dat=lucaDat, HWP=TRUE) fitHWP$coef fitHWP$var summary.luca(fitHWP) # OR 'summary(fitHWP)' #2. Assuming independence only fitDefault<-luca(pen.model=pen.model, gLabel="g", dat=lucaDat) fitDefault$coef fitDefault$var #3. Allowing for dependence fitDep<-luca(pen.model=pen.model, gLabel="g", dat=lucaDat, dep.model=formula(g~a)) fitDep$coef fitDep$var