haplo.em {haplo.score}R Documentation

EM Computation of Haplotype Probabilities

Description

For genotypes measured on unrelated subjects, with linkage phase unknown, compute maximum likelihood estimates of haplotype probabilities. Because linkage phase is unknown, there may be more than one pair of haplotypes that are consistent with the oberved marker phenotypes, so posterior probabilities of pairs of haplotypes for each subject are also computed.

Usage

haplo.em(geno, locus.label=NA, converge.eps=1e-06, maxiter=500)

Arguments

geno Matrix of alleles, such that each locus has a pair of adjacent columns of alleles, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = 2*K. Rows represent alleles for each subject.
locus.label Vector of labels for loci, of length K (see definition of geno matrix).
converge.eps Convergence criterion, based on absolute change in log likelihood (lnlike).
maxiter Maximum number of iterations of EM.

Details

The input data are arranged as a matrix, with N rows representing N subjects, and 2K columns representing pairs of alleles for K loci whose phase is unknown. The input data matrix is reduced to the distinguishable un-phased multilocus marker phenotypes, along with their counts. For each distinguishable phenotype, all possible pairs of haplotypes are enumerated. Maximum likelihood estimation, implemented by the expectation-maximization (EM) algorithm, proceeds by assuming Hardy-Weinberg proportions of underlying genotypes, so that the probability of a pair of haplotypes is the product of their probabilities (times 2 if haplotypes differ), and then relative probabilities are assigned to the list of possible underlying pairs of haplotypes for each genotype. The haplotypes are "counted" from the enumerated list of all possibilities, but the relative probabilities are used as weights. These new counts are used to determine new haplotype frequencies, which in turn are used to update the relative probabilities to new values. This cyclic iteration continues until the likelihood is maximized (i.e., minimal change in the lnlike).

Value

List with components:

converge Indicator of convergence of the EM algorithm (1=converge, 0 = failed).
niter Number of iterations completed in the EM alogrithm.
locus.info A list with a component for each locus. Each component is also a list, and the items of a locus- specific list are the locus name and a vector for the unique alleles for the locus.
locus.label Vector of labels for loci, of length K (see definition of input values).
haplotype Matrix of unique haplotypes. Each row represents a unique haplotype, and the number of columns is the number of loci.
hap.prob Vector of mle's of haplotype probabilities. The ith element of hap.prob corresponds to the ith row of haplotype.
hap.prob.noLD Similar to hap.prob, but assuming no linkage disequilibrium.
lnlike Value of lnlike at last EM iteration (maximum lnlike if converged).
lr Likelihood ratio statistic to test no linkage disequilibrium among all loci.
indx.subj Vector for index of subjects, after expanding to all possible pairs of haplotypes for each person. If indx=i, then i is the ith row of input matrix geno. If the ith subject has n possible pairs of haplotypes that correspond to their marker phenotype, then i is repeated n times.
nreps Vector for the count of haplotype pairs that map to each subject's marker genotypes.
hap1code Vector of codes for each subject's first haplotype. The values in hap1code are the row numbers of the unique haplotypes in the returned matrix haplotype.
hap2code Similar to hap1code, but for each subject's second haplotype.
post Vector of posterior probabilities of pairs of haplotypes for a person, given thier marker phenotypes.

Side Effects

References

Excoffier, L., and Slatkin, M., 1995, Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population, Mol. Biol. Evol. 12(5):921-927.

Hawley, M. E., and Kidd, K. K., 1995, HAPLO: a program using the EM algorithm to estimate the frequenciesof multi-site haplotypes, J.Heredity. 86:409-411.

Long, J. C., Williams, R. C., and Urbanek, M., 1995, An E-M algorithm and testing strategy for multiple-locus haplotypes, Am.J.Hum.Genet. 56:799-810.

Terwilliger, J. D., and Ott, J., 1994, Handbook of human gentic linkage, Johns Hopkins University Press, Baltimore.

See Also

haplo.enum, haplo.hash, haplo.score

Examples

## Don't run: 
haplo <- haplo.em(geno)
## End Don't run

[Package Contents]