mcmcFmodel {Geneland}R Documentation

Inference in a spatial statistical model

Description

Markov Chain Monte-Carlo inference in the spatial F-model

Usage

mcmcFmodel(coordinates,genotypes,ploidy,
path.mcmc, rate.max, delta.coord, npopmin, npopinit,
npopmax, nb.nuclei.max, nit, thinning, freq.model, varnpop, spatial,jcf,
filter.null.alleles)

Arguments

coordinates Spatial coordinates of individuals. A matrix with 2 columns and one line per individual.
genotypes Genotypes of individuals. A matrix with one line per individual and 2 columns (resp. one column) per locus for diploid data (resp. haploid data).
ploidy An integer equal to 1 for haploid data and 2 for diploid data.
path.mcmc Path to output files directory. It seems that the path should be given in the Unix style even under Windows (use / instead of \). This path *has to* end with a slash (/) (e.g. path.mcmc="/home/me/Geneland-stuffs/")
rate.max Maximum rate of Poisson process (real number >0). Setting rate.max equal to the number of individuals in the dataset has proved to be efficient in many cases.
delta.coord Parameter prescribing the amount of unctertainty attached to spatial coordinates. If delta.coord=0 spatial coordinates are consiered as true coordinates, if delta.coord>0 it is assumed that observed coordinates are true coordinates blurred by an additive noise uniform on a square of side delta.coord centered on 0.
npopmin Minimum number of populations (integer >=1)
npopinit Initial number of populations ( integer sucht that npopmin =< npopinit =< npopmax)
npopmax Maximum number of populations (integer >= npopinit). There is no obvious rule to select npopmax, it should be set to a value larger than any value that you can reasonably expect for your data.
nb.nuclei.max Integer: Maximum number of nuclei in the Poisson-Voronoi tessellation. A good guess consists in setting this value equal to 3*rate.max. Lower values can also be used in order to speed up computations. The relevance of the value set can be checked by inspection of the MCMC run. The number of tiles should not go too close to nb.nuclei.max. If it does, you should re-run your chain with a larger value for nb.nuclei.max. In case of use of the option SPATIAL=FALSE, nb.nuclei.max should be set equal to the number of individuals.
nit Number of MCMC iterations
thinning Number of MCMC iterations between two writing steps (if thinning=1, all states are saved whereas if e.g. thinning=10 only each 10 iteration is saved)
freq.model Character: "Falush" or "Dirichlet" (model for frequencies). See also details in detail section of Geneland help page.
varnpop Logical: if TRUE the number of class is treated as unknown and will vary along the MCMC inference, if FALSE it will be fixed to the initial value npopinit. varnpop = TRUE *should not* be used in conjunction with freq.model = "Falush" as in this case it seems that large numbers of populations are not penalized enough and there is a serious risk of inferring spurious sub-populations.
spatial Logical: if TRUE the colored Poisson-Voronoi tessellation is used as a prior for the spatial organisation of populations. If FALSE, all clustering receive equal prior probability. In this case spatial information (i.e coordinates) are not used and the locations of the nuclei are initialized and kept fixed at the locations of individuals.
jcf Logical: if true update of c and f are performed jointly
filter.null.alleles Logical: if TRUE, tries to filter out null allele. An extra fictive null allele is created at each locus coding for all putative null allele. Its frequency is estimated and can be viewed with function PlotFreq. This option is available only with freq.model="Dirichlet".

Value

Successive states of all blocks of parameters are written in files contained in path.mcmc and named after the type of parameters they contain.

Storage format

All parameters processed by function mcmcFmodel are written in the directory specified by ‘path.mcmc’ as follows:

  • File ‘population.numbers.txt’ contains values of the number of populations (nit lines, one line per iteration of the MCMC algorithm)
  • File ‘nuclei.numbers.txt’ contains the number of points in the Poisson point process generating the Voronoi tessellation
  • File ‘color.nuclei.txt’ contains vectors of integers of length nb.nuclei.max coding the class membership of each Voronoi tile. Vectors of class membership for successive states of the chain are concatenated in one column. Some entries of the vector containing clas membership for a current state may have missing values as the actual number of polygon may be smaller that the maximum number allowed nb.nuclei.max. This file has nb.nuclei.max*chain/thinning lines
  • File ‘coord.nuclei.txt’ contains coordinates of points in the Poisson point process generating the Voronoi tessellation. It has nb.nuclei.max*chain/thinning lines and two columns (hor. and vert. coordinates).
  • File ‘drifts.txt’ contains the drift factors for each population, (one column per population).
  • File ‘ancestral.frequencies.txt’ contains allele frequencies in ancestral population. Each line contains all frequencies of the current state. The file has nit lines. In each line, values of allele frequencies are stored by increasing allele index and and locus index (allele index varying first).
  • File ‘frequencies.txt’ contains allele frequencies of present time populations. Column xx contains frequencies of population numer xx. In each column values of allele frequencies are stored by increasing allele index and and locus index (allele index varying first), and values of successive iterations are pasted. The file has nallmax*nloc*nit/thinning lines where nallmax is the maximum numer of alleles over all loci.
  • File ‘Poisson.process.rate.txt’ contains rates of Poisson process
  • File ‘hidden.coord.txt’ contains the coordinates of each individual as updated along the chain if those given as input are not considered as exact coordinates (which is specified by delta.coord to a non zero value).
  • File ‘log.likelihood.txt’ contains log-likelihood of data for the current state of parameters of the Markov chain.
  • File ‘log.posterior.density.txt’ contains log of posterior probability (up to marginal density of data) of the current state of parameters in the Markov chain.

    Author(s)

    Gilles Guillot

    References

    G. Guillot, Estoup, A., Mortier, F. Cosson, J.F. A spatial statistical model for landscape genetics. Genetics, 170, 1261-1280, 2005.

    G. Guillot, Mortier, F., Estoup, A. Geneland : A program for landscape genetics. Molecular Ecology Notes, 5, 712-715, 2005.

    See Also

    simFmodel

    Examples

      # Below is a sequence 
      # of R commands using Geneland functions
    
      # we assume that Geneland is installed
      # and loaded by library("Geneland")
    
     # Simulation of a dataset made of 2 populations
     # 5 loci and 10 alleles per locus
     # on a spatial domain enclosed in a rectangle
     # (x coord. in [0,3], y coord. in [0,1])
     # domain of populations are centered around 
     # points (0.5,0.5) and (2.5,0.5) respectively 
     # (so that the boundary is the vertical line x = 1.5)
    sim = simFmodel(nindiv=100,
              coord.lim=c(0,3,0,1),
              number.nuclei=2,
              coord.nuclei=cbind(c(0.5,0.5),c(2.5,0.5)),
              color.nuclei=c(1,2),
              nall=rep(10,5),
              npop=2,
              drift=c(.3,.3),
              plots=FALSE,
              ploth=FALSE,
              seed=123)
    
      # you can check the simulated dataset by
    ls()  ## gives a list of existing objects
    
    summary(sim) ## describes what object sim actually contains
    
      #  define a place for MCMC outputs
    path.mcmc = paste(tempdir(),"/",sep="")
    
    ## Not run: 
      # First run of MCMC algorithm
      # in order to get the posterior mode of the number of populations
    
    mcmcFmodel(sim$coordinates,
               sim$genotypes,
               path.mcmc=path.mcmc,
               rate.max=100,
               delta.coord=0,
               npopmin=1,
               npopinit=5,
               npopmax=10,
               nb.nuclei.max=200,
               nit=50000,
               thinning=50,
               freq.model="Dirichlet",
               varnpop=TRUE,
               spatial=TRUE)
    
      # Trace of number of populations
      # Should display a mode at 2
    Plotnpop(path.mcmc)
    
     # Then re-run the chain with fixed number of populations
    mcmcFmodel(sim$coordinates,
               sim$genotypes,
               path.mcmc=path.mcmc,
               rate.max=100,
               delta.coord=0,
               npopmin=1,
               npopinit=2,
               npopmax=2,
               nb.nuclei.max=200,
               nit=5000,
               thinning=10,
               freq.model="Dirichlet",
               varnpop=FALSE,
               spatial=TRUE)
    
       # Post-processing the chain 
    PostProcessChain(sim$coordinates,sim$genotypes,
                      path.mcmc=path.mcmc,
                      nxdom=50,
                      nydom=50,
                      burnin=0)
    
       # Plots allele frequencies of allele #1 at locus  #1 in sub-population  #1
     PlotFreq(sim$allele.numbers,
    path.mcmc=path.mcmc,ipop=1,iloc=1,iall=1)
    
       # Map of posterior probabilites
       # of population membership
     PlotTessellation(sim$coordinates,path.mcmc=path.mcmc)
    
       # Map of posterior mode 
       # of population membership
     PosteriorMode(sim$coordinates,path.mcmc=path.mcmc,
                   write=FALSE,plotit=TRUE)
    
     # read the estimated population for indivuals
     pppm.indiv <- read.table(paste(path.mcmc,"modal.pop.indiv.txt",sep=""))
    
     # add a plot of the individual colored as their estimated population of origin
    points(sim$coordinates,col=pppm.indiv[,3],pch=16)
    
    ## End(Not run)
    
    

    [Package Geneland version 2.0.10 Index]