hapassoc {hapassoc} | R Documentation |
This function takes a dataset of haplotypes in which rows for individuals of uncertain phase have been augmented by “pseudo-individuals” who carry the possible multilocus genotypes consistent with the single-locus phenotypes. The EM algorithm is used to find MLE's for trait associations with covariates in generalized linear models.
hapassoc(form,haplos.list,baseline = "missing" ,family = binomial(), freq = FALSE, maxit = 50, tol = 0.001, ...)
form |
model equation in usual R format |
haplos.list |
list of haplotype data from pre.hapassoc |
baseline |
optional, haplotype to be used for baseline coding. Default is the most frequent haplotype. |
family |
binomial, poisson, gaussian or freq are supported, default=binomial |
freq |
initial estimates of haplotype frequencies, default values are
calculated in pre.hapassoc using standard haplotype-counting
(i.e. EM algorithm without adjustment for non-haplotype covariates) |
maxit |
maximum number of iterations of the EM algorithm; default=50 |
tol |
convergence tolerance in terms of the maximum difference in parameter estimates between interations; default=0.001 |
... |
additional arguments to be passed to the glm function such as starting values for parameter estimates in the risk model |
it |
number of iterations of the EM algorithm |
beta |
estimated regression coefficients |
freq |
estimated haplotype frequencies |
fits |
fitted values of the trait |
wts |
final weights calculated in last iteration of the EM algorithm. These are estimates of the conditional probabilities of each multilocus genotype given the observed single-locus genotypes. |
var |
joint variance-covariance matrix of the estimated regression coefficients and the estimated haplotype frequencies |
dispersionML |
maximum likelihood estimate of dispersion parameter
(to get the moment estimate, use summary.hapassoc ) |
family |
family of the generalized linear model (e.g. binomial, gaussian, etc.) |
response |
trait value |
converged |
TRUE/FALSE indicator of convergence. If the algorithm
fails to converge, only the converged indicator is returned. |
When fitting logistic regression models (i.e. family=binomial()
),
you will see warning messages:
non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)
even when the response variable includes only counts. These warnings result from fitting a weighted logistic regression at each iteration of the EM algorithm and can be safely ignored.
Burkett K, McNeney B, Graham J (2004). A note on inference of trait associations with SNP haplotypes and other attributes in generalized linear models. Human Heredity, 57:200-206
pre.hapassoc
,summary.hapassoc
,glm
,family
.
data(hypoDat) example.pre.hapassoc<-pre.hapassoc(hypoDat, 3) example.pre.hapassoc$initFreq # look at initial haplotype frequencies # h000 h001 h010 h011 h100 h101 h110 #0.25179111 0.26050418 0.23606001 0.09164470 0.10133627 0.02636844 0.01081260 # h111 #0.02148268 names(example.pre.hapassoc$haploDM) # "h000" "h001" "h010" "h011" "h100" "pooled" # Columns of the matrix haploDM score the number of copies of each haplotype # for each pseudo-individual. # Logistic regression for a multiplicative odds model having as the baseline # group homozygotes '001/001' for the most common haplotype example.regr <- hapassoc(affected ~ attr + h000+ h010 + h011 + h100 + pooled, example.pre.hapassoc, family=binomial()) # Logistic regression with separate effects for 000 homozygotes, 001 homozygotes # and 000/001 heterozygotes example2.regr <- hapassoc(affected ~ attr + I(h000==2) + I(h001==2) + I(h000==1 & h001==1), example.pre.hapassoc, family=binomial())