getXlist {MasterBayes}R Documentation

Design Matrices for the Multinomial Log-Linear Model

Description

Forms design matrices for each offspring, and stores other relavent inforamtion.

Usage

getXlist(PdP, GdP=NULL, A=NULL, E1=0.005, E2=0.005, mm.tol=999, ...)

Arguments

PdP PdataPed object
GdP optional GdataPed object
A optional list of allele frequencies. If not sepcified and GdP exists, allele frequencies are taken from GdP$G using extractA
E1 the probability of a dominant allele being scored as a recessive allele for dominant markers
E2 per-allele genotyping error rate. E2(2-E2) is the per-genotype rate defined in Kalinowski (2006) for codominant markers, and E2 is the probability of a recessive allele being scored as a dominant allele for dominant markers
mm.tol maximum number of genotype mismatches tolerated for potential parents
... further arguments to be passed

Details

This is the main R routine for setting up design matrices for the various models that may be defined in the formula argument of PdataPed. If a GdataPed object is passed to getXlist design matrices of genetic likelihoods are calculated (see fillX.G), and the number of mismatches between offspring and parental genotypes are stored (see mismatches). mm.tol specifies the maximum number of mismatches that are tolerated between an offspring and a parent. Parents that exceed this number of mismatches are excluded, and the design matrices for non-excluded parents are reordered by the number of mismatches. This increases the efficency of sampling from the multinomial distribution of parents, because high probability parents appear first.

Value

id vector of unique identifiers taken from PdP
beta_map index relating the vector of unique parameters to the columns of the design matrices
X list of design matrices and other information.

Note

Each element of X refers to an offspring (names(X)) and contains vectors for the set of potential parents (restdam.id and restsire.id) of each offspring. Also included are the set of individuals that may have been parents but have been excluded for certain reasons (dam.id and sire.id). Exclusion may have been based on the number of genotype mismatches, or it may have been on biological grounds (See the keep argument of varPed). Parental id's are stored as integers which correspond to the actual id's stored in id. Parental id's greater than the length of id refer to unsampled parents.

Six types of design matrix are used (XDus, XDs, XSus, XSs, XDSus, XDSs). XD.. are the design matrices for dams, and XS.. are the design matrices for sires. The rows of each design matrix are associated with indivdiuals in dam.id and sire.id, respectively. When interactions between dam and sire variables are modelled, or a varPed variable is created using the argument relational="MATE", the design matrices vary over parental combinations. XDS.. are the design matrices for parental combinations with sire's varying the fastest. Each of these three types of design matrix have two subclasses: s and us. s are design matrices which are fully observed, either beacuse unsampled parents do not exist or beacuse unsampled parents have known phenotypes (see argument USvar in varPed). us are for design matrices where the phenotypes of unsampled parents are unkown. The matrices XDus and Xsus have a row of NA's which correspond to the unsampled parent category. The design matrix XDSus will typically have many rows of NA's because each sampled parent may be paired to an unsampled indiviual.

When the argument gender=NULL is passed to varPed the respective columns in the dam and sire design matrices are associated with a single parameter. Because of this the number of parameters to be estimated may be less than the total number of columns in the 6 design matrices. beta_map relates a parameter vector to the columns of the design matrices. The columns of the design matrices are numbered in the order they are introduced in the preceding paragraph (i.e XDus through to XDSs). The parameter vector is ordered identically except parameters associated with genderless variables are omitted for males. par_order is similar to beta_map but relates the order of the parameters specified in the formula argument to PdataPed to the respective columns of the design matrices.

If the argument relational="OFFSPRING" is specified in varPed, or the set of potential parents varies over offspring, the design matrices will vary across offspring. For this reason I create a design matrix for each offspring irrespective of whether the matrices vary or not. The design matrices for the genetic likelihoods will always vary over offspring.

Author(s)

Jarrod Hadfield j.hadfield@ed.ac.uk

References

Hadfield J.D. et al (2006) Molecular Ecology 15 3715-31 Kalinowski S.T. et al (2006) Molecular Ecology in press Hadfield J. D. et al (2007) in prep

See Also

varPed, MCMCped

Examples

id<-1:20
sex<-sample(c("Male", "Female"),20, replace=TRUE)
offspring<-c(rep(0,18),1,1)
lat<-rnorm(20)
long<-rnorm(20)
mating_type<-gl(2,10, label=c("+", "-"))

test.data<-data.frame(id, offspring, lat, long, mating_type, sex)

res1<-expression(varPed("offspring", restrict=0))
var1<-expression(varPed(c("lat", "long"), gender="Male", 
  relational="OFFSPRING"))
var2<-expression(varPed(c("mating_type"), gender="Female", 
  relational="MATE"))
var3<-expression(varPed("mating_type", gender="Male"))

PdP<-PdataPed(formula=list(res1, var1, var2, var3), data=test.data)

X.list<-getXlist(PdP)
X.list$X$"19"$XSs

# For the first offspring we have the design matrix for sires
# The first column represents the distance between each male 
# and each offspring. The second column indicates the male's 
# mating type. Note that contrasts are set up with the first 
# male so the indicator variables may be negative.

matrix(X.list$X$"19"$XDSs, ncol=length(X.list$X$"19"$dam.id), 
   nrow=length(X.list$X$"19"$sire.id))

# incidence matrix indicating whether Females (columns) and Males (rows)
# are the same mating type. Again this is a contrast with the first 
# parental combination (which is +/+) so 0 actually represents parents
# with the same mating type.

[Package MasterBayes version 2.42 Index]