hyptest.network.BSA {tossm}R Documentation

Boundary setting algorithms

Description

hyptest.network.BSA, structure.BSA, wombsoft.BSA, and fixed.MU.BSA are examples of boundary-setting algorithms (BSAs) used to set management boundaries in a tossm simulation.

Usage


hyptest.network.BSA(gs,abund,var,C,landscape.poly,sample.polys,sig.level) 
structure.BSA(gs,abund,var,C,landscape.poly,sample.polys,mainparams,BSA.params)
wombsoft.BSA(gs,abund,var,C,landscape.poly,sample.polys,sig.level,min.MU.size)
fixed.MU.BSA(gs,abund,var,C,landscape.poly,sample.polys,n.mus)

Arguments

gs genetic samples; a list of arrays of the alleles at each locus for each individual animal from which genetic samples were taken, arranged by genetic sampling year and sampling polygon. Each list is a list of length = number of sampling polygons (sample.poly), and each element of these lists is an array of dimensions (number of genetic samples per sample.poly) X (number haploid loci + number of diploid loci) X (2 haplotypes)
abund matrix of abundance estimates for each management unit sampling year
var variance of these abundance estimates
C catches by simulation year and management unit
landscape.poly a landscape polygon object defining the study area extent (the bounding box of the bp.polys)
sample.polys a list of sampling polygons to be lumped or split into separate management units by the BSA
sig.level significance value threshold for setting a management boundary in hyptest.network.BSA
n.mus number of MUs to be created by fixed.MU.BSA
mainparams parameters to be used by the program Structure in structure.BSA. mainparams is a list consisting of the following components:
numinds: total number of individuals sampled
numloci: number of loci to be analyzed by Structure
ploidy: ploidy of loci to be analyzed by Structure
noadmix: whether populations are assumed to be admixed (1) or not (0)
freqscorr: whether allele frequencies between populations are correlated (1) or not (0)
maxpops: the maximum number of populations to be defined by Structure
burnin: the length of the burnin period used by Structure
burnin: the number of iterations of the MCMC chain to be used by Structure in assigning population membership
Further information regarding these parameters can be found in the Structure user's manual
BSA.params a list with the following components to be used by structure.BSA:
output.dir: the directory to which STRUCTURE results should be written
k: a list of the values of k (number of populations) to be evaluated
struct.exe.path: the directory where structure.exe resides and where main params will be written. The default value should be fine, so long as structure.exe is not moved subsequent to the installation of tossm
struct.outfile: the name of the output file to be written by STRUCTURE.

Details

hyptest.network.BSA uses the genetic data in gs to create networks of sampling polygons which are joined together by non-significant statistical tests (G-statistic, Goudet et al., 1996). This method was proposed by Waples and Gagiotti (2006), and is illustrated further there (p.1425).
hyptest.network.BSA checks for pairwise significance between all sampling polygons. Sampling polygons are joined together in a network if they do not show significant genetic differentiation (at the alpha level chosen by the user). Network membership extends to all sampling polygons connected by at least one non-significant test; ie. a sampling polygon might not be directly connected to all sampling polygons in its assigned network. Networks are separated from each other by exclusive significant tests with all other sampling polygon networks. Each resulting network constitutes a management unit.

structure.BSA uses the Bayesian clustering method implemented by the program STRUCTURE (Pritchard et al. 2000, Falush et al. 2002) to determine the number of MUs that should be defined. The BSA uses STRUCTURE to divide the samples into the numbers of populations specified by BSA.params$k and compares the log-likelihoods of the resulting groupings to determine the optimal number of MUs. The MU membership of each samplings site is then determined by examining the population membership, as defined by STRUCTURE, of each sample within the sampling site. The site is assigned to the MU to which the greatest number of samples belongs. Thus, if a sampling site contained 50 samples, 20 of which were assigned to population 1 and 30 of which were assigned to population 2, the entire sampling site would be assigned to MU 2. N0ote that a consequence of this approach is that it may result in the definition of fewer MUs than there are STRUCTURE-defined populations. For instance, if STRUCTURE defined 4 populations, but only assigned a few samples to population 4, there may be no sampling site in which the plurality of samples belongs to population 4. In this case, no sampling site would be assigned to MU 4 and there would effectively be only 2 MUS.
Once sampling sites have been assigned to MUs, the MU.polys are defined using the function MU.poly.generator
In order to use structure.BSA, the command-line version of STRUCTURE (i.e., without the graphical user interface), must first be installed. This program is available in UNIX, DOS, and MacOS X versions from the STRUCTURE homepage. The argument BSA.params$struct.exe.path is used to specify the directory in which STRUCTURE is installed.

wombsoft.BSA uses the R package wombsoft of Crida and Manel (2007). It first identifies regions within the study area where genetic gradients are relatively strong. The algorithm then uses a binomial test to assess the significance of these areas as genetic boundaries. The significance threshold of this test can be chosen by the user using the argument sig.level. The output of the wombsoft algorithm is then adapted in the following two ways:
1) If only a single, non-significant grid-cell interrupts a potential MU boundary, the boundary is drawn across such one-cell gaps.
2) If the size of an MU is less than a user-specified proportion of the total study area (min.MU.size), the MU is ignored. This alleviates the fact that the algorithm sometimes outputs very small (one or two grid cell) MUs.

fixed.MU.BSA simply splits the study region into equally-sized management units, and disregards all genetic information. The number of management units is specified by the user. When initially running TOSSM simulations, the user may find it useful to use fixed.MU.BSA for testing purposes.

How to write a BSA that will run within the run.tossm code

The BSA in essence needs to do two things: 1) analyze genetic and/or abundance data from run.tossm and, 2) set management units. The management units output from the BSA must completely cover the extent of the breeding population polygons, yet not overlap each other. They must be returned from the BSA in the form of a list, each element consisting of one management unit polygon. These polygons should be of class gpc.poly, implemented via gpclib.

The analysis is likely to be done exclusively by an external program not written in R. If your BSA is going to call outside genetic software, you will probably need to write the genetic data to disk in some specified format, then call the software, then read the results back in to R and reorganize them for return to run.tossm. Regardless, there needs to be at least some R code written, even if just a wrapper that allows the analytic method and run.tossm to interface.

There are a number of examples of mixing and migration models that have been proposed for use as analytic methods (Pritchard et al., 2000; Dupanloup et al, 2002; Guillot et al., 2005). It is foreseen that some of these methods can stand alone or be combined to supply the information desired to analyze tossm simulated data and define management units.

Note– there is an important distinction between two types of potential BSAs:

1) Those BSAs that simply assign each sample.poly to a management unit. In this type of BSA, statistical analysis of genetic samples will be done at the level of the sampling polygon. For such BSAs, the helper function MU.poly.generator described below may be of use to the BSA developer.

2) Those BSAs that work at the level of individual genetic samples. This type of BSA might disregard which sampling polygon samples come from. In this case, management unit boundaries could quite possibly bisect sampling polygons and/or, conversely, assign individuals from different sampling polygons to the same management unit. MU.poly.generator will not be of use to those developing such BSAs, and the developer is tasked with ensuring that a list of non-overlapping management unit polygons that cover the entire study area (=landscape.poly) is returned from the BSA.

Whatever the method used, the R code written for the BSA takes the following arguments:

My.BSA<- function(genetic samples, abundance estimates, variances, catches,
landscape.poly, sample.polys, optional param1, optional param2)

The first six of these arguments are passed automatically to the BSA function by run.tossm, so the function must have these six arguments whether it uses the values contained in them or not. They correspond to the first six arguments to hyptest.network.BSA, and are described in the arguments section above.

The last two arguments can optionally be used to pass any additional information that might be required to run the BSA function. For example, hyptest.network.BSA uses only the first optional parameter, an alpha level for genetic comparisons it performs, and fixed.MU.BSA also uses only one optional argument, n.mus, the number of management units the user wishes to define. Optional arguments for the BSA are specified within the run.tossm argument BSA.args. Each of the arguments (supplied by run.tossm or user-defined) can be named as the user wishes. The optional arguments can be of any type. Thus, if a BSA requires more than two additional parameters, the optional arguments can be used to pass lists of parameters.

Supplying further information to the BSAagg.gs.tseries

The function agg.gs.tseries can provide the BSA with the run.tossm genetic data from each sample.poly aggregated across years, the number of loci in the current simulation, and the number of alleles in each sample.poly. This information can be used by the BSA simply by placing following line first in the BSA function:

agg.gs.tseries() :creates the objects agg.gs,n.loci,n.areas ,and n.alleles

Here are details on the objects made available by agg.gs.tseries:

agg.gs
similar to gs (see arguments section above) except with genetic samples aggregated across years for each genetic sampling polygon. agg.gs is a list with length equal to the number of sampling polygons. Each list element consists of an array with dimensions (number of genetic samples, entire simulation) X (number haploid loci+number of diploid loci) X 2. Each element has a coords attribute that provides x and y coordinates for each sampled individual. The agg.gs object is also attributed with seq.list, a list of the unique haplotype states sampled in the simulation.

agg.gtypes
the same as agg.gs except that for each sampling polygon, the genetic data represented by a matrix with dimensions (number of genetic samples, entire simulation) X (number haploid loci+ 2*number of diploid loci)

n.loci
The number of loci used in the current simulation.

n.areas
The number of sampling polygons in the current simulation.

n.alleles
the number of alleles at each locus in each sampling polygon

BSAs that work by assigning entire sampling sites (rather than individual samples) to MUs can be assisted by the function MU.poly.generator, which creates MU polygons based on the MU membership of the sampling polygons. Further info on MU.poly.generator is available on its help page.

Value

munits–A list with each element consisting of a management unit, represented by a polygon of class gpc.poly.

References

Crida A., and Manel S. 2007. WOMBSOFT: an R package that implements the Wombling method to identify genetic boundary. Molecular Ecology Notes 7, 588–591

Dupanloup I., Scheider S. and Excoffier L. 2002 A simulated annealing approach to define the genetic structure of populations. Molecular Ecology 11, 2571–2581

Guillot G., Mortier F. and Estoup A. 2005 GENELAND: a computer package for landscape genetics. Molecular Ecology Notes 5, 712–715

Pritchard J.K., Stephens, M. and Donnelly P. 2000 Inference of population structure using multilocus genotype data. Genetics 155, 945–59.

Waples R.S. and Gaggiotti O. 2006 What is a population? An empirical evaluation of some genetic methods for identifiying the number of gene pools and their degree of connectivity. Molecular Ecology 15, 1519–1539.

See Also

tossm-package, run.tossm

Examples


#set up sampling and management schedule using def.make.schedule
schedule=def.make.schedule(n.pre.RMP=5,n.RMP=10,n.post.RMP=5,abund.gap=2)

#run example with BSA set to hyptest.network.BSA
example<-run.tossm(rland = example.landscape, bp.polys=example.bp.polys,
schedule = schedule,n.samples = 25, sample.polys=example.sample.polys, 
initial.depletion=0.30,historic.removals=NULL, 
BSA = hyptest.network.BSA, BSA.args = list(sig.level=.05),
harvest.interval=10)

[Package tossm version 1.2 Index]