mcmcFmodel {Geneland} | R Documentation |
Markov Chain Monte-Carlo inference in the spatial F-model
mcmcFmodel(repdat, repmcmc, lambdamax, dt, nclassmin, nclassinit, nclassmax, nppmax, nchain, stepw, model, varnclass, spatial)
repdat |
Path to input files directory |
repmcmc |
Path to output files directory |
lambdamax |
Maximum rate of Poisson process (real number >0).
Setting lambdamax equal to the number of individuals in the
dataset has proved to be efficient in many cases. |
dt |
Parameter prescribing the amount of unctertainty attached
to spatial coordinates. If dt =0 spatial coordinates are
consiered as true coordinates, if dt >0 it is assumed that observed
coordinates are true coordinates blurred by an additive noise uniform
on a square of side dt centered on 0. |
nclassmin |
Minimum number of populations (integer >=1) |
nclassinit |
Initial number of populations
( integer sucht that
nclassmin =< nclassinit =< nclassmax ) |
nclassmax |
Maximum number of populations (integer >=
nclassinit ).
There is no obvious rule to select nclassmax ,
it should be set to a value larger than any value that
you can reasonably expect for your data. |
nppmax |
Integer: Maximum number of nuclei in the
Poisson-Voronoi tessellation. A good guess consists in setting this
value equal to 10*lambdamax . The relevance of this rule can be
checked by inspection of the MCMC run. The number of tiles should not
go too close from nppmax . If it does, you should re-run your
chain with a larger value for nppmax |
nchain |
Number of MCMC iterations |
stepw |
Number of MCMC iterations between two writing steps (if stepw =1, all
states are saved whereas if e.g. stepw =10 only each 10 iteration is saved) |
model |
Character: "Falush" or "Dirichlet" (model for
frequencies).
See also details in detail section of Geneland help page. |
varnclass |
Logical: if TRUE the number of class is treated as
unknown and will vary along the MCMC inference, if FALSE it will be
fixed to the initial value nclassinit .
varnclass = TRUE *should not* be used in conjunction with
model = "Falush" as in this case it seems that large numbers
of populations are not penalized enough and there is a serious risk
of inferring spurious sub-populations. |
spatial |
Logical: if TRUE the colored Poisson-Voronoi tessellation is used as a prior for the spatial organisation of populations. If FALSE, all clustering receive equal prior probability. In this case spatial information (i.e coordinates) are not used and the locations of the nuclei are initialized and kept fixed at the locations of individuals. |
repdat
directory should contain at least three files
named exactly : coord.txt
, genotypes.txt
and
allele.numbers.txt
and containing respectively the spatial
coordinates, the genotypes and the number of alleles per locus.
These files should be plain ascci files with no header.
coordinates.txt
should have one line per individual
and two columns (x, y coordinates) missing value are not allowed.
genotypes.txt
should have one line per individual
and two columns per locus,
alleles should be coded as integer between 1 and the maximum number of
alleles for this locus, missing genotypic values should be coded as -999.
If your initial file as the required number of lines and columns but
with alleles coded with non-consecutive integers, then the function
FormatGenotypes
should be used first.
allele.numbers.txt
should have one line per locus an one column
giving number of alleles for consecutive loci.
Successive states of all blocks of parameters are written in files
contained in repout
and named after the type of parameters they contain.
All parameters processed by function mcmcFmodel
are
written in the directory specified by repout
as follows:
population.numbers.txt
contains values of the number of
populations (nchain
lines, one line per iteration of the MCMC algorithm)
nuclei.numbers.txt
contains the number of points in the Poisson
point process generating the Voronoi tessellation
color.nuclei.txt
contains vectors of integers of
length nppmax
coding the class membership of each Voronoi tile.
Vectors of class membership for successive states of the chain are
concatenated in one column. Some entries of the vector containing
clas membership for a current state may have missing values as the
actual number of polygon may be smaller that the maximum number allowed
nppmax
.
This file has nppmax*chain/stepw
lines
coord.nuclei.txt
contains coordinates of points in the Poisson
point process generating the Voronoi tessellation. It has
nppmax*chain/stepw
lines
and two columns (hor. and vert. coordinates).
drifts.txt
contains the drift factors for each
population, (one column per population).
ancestral.frequencies.txt
contains allele frequencies in ancestral
population. Each line contains all frequencies of the current state.
The file has nchain
lines.
In each line, values of allele frequencies are stored by increasing
allele index and and locus index (allele index varying first).
frequencies.txt
contains allele frequencies of present time
populations. Column xx contains frequencies of population numer xx.
In each column values of allele frequencies are stored by increasing
allele index and and locus index (allele index varying first), and
values of successive iterations are pasted.
The file has nallmax*nloc*nchain/stepw
lines where nallmax
is the maximum numer of alleles over all loci.
Poisson.process.rate.txt
contains rates of Poisson process
log.likelihood.txt
contains log-likelihood of data
for the current state of parameters of the Markov chain.
log.posterior.density.txt
contains log of posterior probability
(up to marginal density of data) of the
current state of parameters in the Markov chain.
Gilles Guillot
A spatial statistical model for landscape genetics, Guillot, Estoup, Mortier, Cosson, Genetics, 2005
Guillot, G., Geneland : A program for landscape genetics. Molecular Ecology Notes, submited.
# Below is a complete sequence # of commands using Geneland functions # we assume that Geneland is installed # and loaded by library("Geneland") # first look for a place to write # simulated data and MCMC outputs if(.Platform$OS == "unix"){ repdat = "/tmp/" repmcmc= "/tmp/" } if(.Platform$OS == "windows"){ repdat = "/temp/" repmcmc= "/temp/" } # Simulation of a dataset made of 2 populations # 2 loci and 2 alleles per locus simFmodel(nindiv=100, slim=c(0,1,0,1), npp=2, nloc=5, nall=c(5,5,5,5,5), nclass=2, drift=c(.3,.3), plots=FALSE, ploth=FALSE, seed=12345, write=TRUE, repout=repdat) # First run of MCMC algorithm # in order to get the posterior mode of the number of populations mcmcFmodel(repdat=repdat, repmcmc=repmcmc, lambdamax=100, dt=0, nclassmin=1, nclassinit=5, nclassmax=10, nppmax=200, nchain=5000, stepw=10, model="Dirichlet", varnclass=TRUE, spatial=TRUE) # Trace of number of populations # Should display a mode at 2 Plotnclass(repdat,repmcmc) # Then re-run the chain with fixed number of populations mcmcFmodel(repdat=repdat, repmcmc=repmcmc, lambdamax=100, dt=0, nclassmin=1, nclassinit=2, nclassmax=2, nppmax=200, nchain=5000, stepw=10, model="Dirichlet", varnclass=FALSE, spatial=TRUE) # Post-processing the chain PostProcessChain(repdat=repdat, repmcmc=repmcmc, nxdom=50, nydom=50, burnin=0) # Plots allele frequencies of allele #1 at locus #1 in sub-population #1 PlotFreq(repdat=repdat,repmcmc=repmcmc,ipop=1,iloc=1,iall=1) # Map of posterior probabilites # of population membership PlotTessellation(repdat=repdat,repmcmc=repmcmc) # Map of posterior mode # of population membership PosteriorMode(repdat=repdat,repmcmc=repmcmc, write=FALSE,plotit=TRUE)