popdiv {popgen} | R Documentation |
This function assesses between population variability using unlinked multi-locus multi-allelic genotype data from the populations of interest.
popdiv(L, P, NUMA, N, X, burnin = 10000, iter = 10000, m = 300, csd = 0.01, outc = FALSE)
L |
Number of loci |
P |
Number of populations |
NUMA |
Vector of the number of alleles at each loci |
N |
LxP matrix containing the number of genotypes in each population at each locus. |
X |
Matrix with L rows, one for each locus. Each row contains a list of allele counts from each population. For example, if there are 2 populations and the locus has 3 alleles then the first 3 entries in the row specify the allele counts in population 1 and the next 3 entries specify the allele counts in population 2. The rest of the row is set to 0. |
burnin |
Number of burn-in iterations in MCMC run |
iter |
Number of sampling iterations in MCMC run |
m |
Scale parameter of Dirichlet distribution used to update global allele frequencies (p). |
csd |
Standard deviation of Normal distribution used to update the population variance parameters (c). |
outc |
Flag that specifies whether the samples of c are returned (Logical) |
The model fitted has a Multinomial-Dirichlet structure with a specific parameterisation suitable for the situation.
x_{il_{j}} = number of type j alleles observed at locus l in population i.
n_{il_{j}} = number of type j alleles observed at locus l in population i.
α_{il_j} = locus l type j allele frequency of the population i.
π_{l_j} = locus l type j allele frequency of the `global' population.
c_{i} = variance parameter of population i.
J_l = number of alleles at locus l.
{x_{il_{1}}, ..., x_{il_{J_l}}} sim textrm{Multinom}(α_{il_1}, ..., α_{il_{J_l}})
{α_{il_{1}}, ..., α_{il_{J_l}}} sim textrm{Dirichlet}(π_{l_1}frac{(1 - c_{i})}{c_{i}}, ..., pi_{l_{J_l}}frac{(1 - c_{i})}{c_{i}})
The model is specified in a Bayesian framework by placing uniform priors on the c and π parameters.
A Metropolis-Hastings algorithm is used to sample from the posterior distribution of the parameters c and π.
The proposal distribution for the c parameters is
c^{textrm{new}} sim textrm{N}(c^{textrm{old}}, textrm{csd}^2)
The proposal distribution for the π parameter is
π_l^{textrm{new}} sim textrm{Dirichlet}(mπ_{l1}, ..., mπ_{lJ_l})
The conjugate nature of the model allows the α parameters to be integrated out of the likelihood so that these parameters are not sampled.
The c parameters are analogous to Fst estimates which are traditional measures of population diversity. See [1] for more details.
List with components
C.sample |
An (iter x P) matrix containing the samples of the c parameters. |
muc |
The posterior mean of the c parameters. |
sdc |
The posterior standard deviation of the c parameters. |
mup |
The posterior mean of the π parameters. |
sdp |
The posterior standard deviation of the π parameters. |
Jonathan Marchini
[1] Nicholson et al. (2002), Assessing population differentiation and isolation from single-nucleotide polymorphism data. JRSS(B), 64, 695–715
[2] Marchini and Cardon (2002) Discussion of Nicholson et al. (2002). JRSS(B), 64, 740–741
[3] Nichols and Balding (1995) A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica, 96, 3–12.
## Note : In the examples below the number of burn-in and sampling ## iterations is set to 100 so that these examples run in a reasonable ## time. In practice we recommend setting these parameters much higher so ## that convergence of the Markov chain to the stationary distribution is ## more likely. ############### ## EXAMPLE 1 ## ############### ## SNP dataset from Nicholson et al. (2002). 66 loci, 4 populations ex1 <- read.table(paste(.path.package("popgen"), "example_data1", sep = .Platform$file.sep)) ex1 <- matrix(unlist(ex1), 66, 13, byrow = FALSE) NUMA <- ex1[,1] N <- ex1[, 2:5] X <- ex1[, 6:13] L <- 66 P <- 4 res <- popdiv(L, P, NUMA, N, X, burnin = 100, iter = 100, m = 10, csd = 0.01, outc = TRUE) plot(c(0, 100), c(0, max(res$C.sample)), type = "n") lines(1:100, res$C.sample[, 1]) lines(1:100, res$C.sample[, 2], col = 2) lines(1:100, res$C.sample[, 3], col = 3) lines(1:100, res$C.sample[, 4], col = 4) ############### ## EXAMPLE 2 ## ############### ## Simulated dataset from the Multinomial-Dirichlet model. ## 100 loci, 3 populations. ## Variance parameters set to 0.01, 0.02, 0.04. ## Each loci has a variable number of alleles (between 3 and 8)). ## Dataset stored in the following format :- ## - each individuals genotypes are stored in 2 rows of the file. ## - the first entry in each row specifies the individuals population. ## - each row has (L + 1) entries where L is the number of loci. ## To examine the datafile use edit(ex2) ex2 <- read.table(paste(.path.package("popgen"), "example_data2", sep = .Platform$file.sep)) X <- matrix(unlist(ex2), nrow(ex2), ncol(ex2), byrow = FALSE) X1 <- popdiv.convert(X) res1 <- popdiv(L = nrow(X1$N), P = ncol(X1$N), NUMA = X1$NUMA, N = X1$N, X = X1$X, burnin = 100, iter = 100, m = 10, csd = 0.01, outc = TRUE) plot(c(0, 100), c(0, max(res1$C.sample)), type = "n") lines(1:100, res1$C.sample[, 1]) lines(1:100, res1$C.sample[, 2], col = 2) lines(1:100, res1$C.sample[, 3], col = 3)