bicreg.qtl {BayesQTLBIC}R Documentation

Bayesian QTL analysis using the BIC criterion

Description

Bayesian multi-locus QTL analysis based on Bayesian model selection in linear models using the BIC criterion to calculate approximate posterior probabilities for models.

Usage

bicreg.qtl(x, y, wt = rep(1, length(y)), strict = TRUE, OR = 1000, 
    maxCol = 31, drop.factor.levels = TRUE, nvmax = 4, nbest = 10, 
    intercept = TRUE, do.occam = FALSE, n = length(y)/num.imputations, 
    num.imputations = 1, prior = 0.5, delta = 1, p.sg = 1, eval.markers = TRUE, 
    neval = NULL, keep.size = 1, method = c("regsubsets", "leaps")[1]) 
bicreg2(x, y, wt = rep(1, length(y)), strict = FALSE, OR = 1000, 
    maxCol = 31, drop.factor.levels = TRUE, nvmax = 4, nbest = 10, 
    intercept = TRUE, do.occam = TRUE, n = length(y)/num.imputations, 
    num.imputations = 1, prior = 0.5, delta = 1, p.sg = 1, eval.markers = FALSE, 
    neval = NULL, keep.size = -1, method = c("regsubsets", "leaps")[1]) 

Arguments

x a matrix of independent variables, based on marker genotypes, often from a single chromosome
y a vector of values for the dependent variable (trait values)
wt a vector of weights for regression
strict logical. FALSE returns all models whose posterior model probability is within a factor of 1/OR of that of the best model. TRUE returns a more parsimonious set of models, where any model with a more likely submodel is eliminated.
OR a number specifying the maximum ratio for excluding models in Occam's window
maxCol a number specifying the maximum number of columns in the design matrix (including the intercept) to be kept, i.e. maximum number of markers to include
drop.factor.levels logical. Indicates whether factor levels can be individually dropped in the stepwise procedure to reduce the number of columns in the design matrix, or if a factor can be dropped only in its entirety.
nvmax maximum number of variables in a model
nbest a value specifying the number of models of each size returned to bic.glm by the leaps algorithm.
intercept add an intercept term
do.occam do Occam's razor selection
n original sample size, before multiple imputations
num.imputations number of imputations used to construct x, y
prior a vector or scalar specifying prior probabilities per marker for a QTL to be in the vicinity of the marker; generally proportional to the distance to flanking markers and total number of QTL expected genome. Defaults to 0.5 which is usually too high.
delta adjustment factor for the penalty term in the BIC criterion, default is no adjustment (delta=1); (Cf. Broman and Speed 2002); not needed if using subjective prior probabilities and sample size is ample (p.sg=1 and n >= 100; Ball 2007).
p.sg proportion genotyped (p.sg/2 per tail), if selective genotyping is being used, default 1, corresponding to fully genotyped population
eval.markers evaluate model averaged estimates for marker effects (effects of allelic substitution)
neval use neval top models on which to evaluate model averaged estimates of marker effects, default NULL, use all models
keep.size keep models up to this size regardless of Occam's razor criterion, e.g. to ensure the intercept only model is available for comparison
method choice of method, leaps or regsubsets

Details

Provides posterior probabilities for linear models representing alternative QTL genetic architectures, which can be used for Bayesian inference of the number of QTL and probabilities for QTL presence in a region. Provides Bayesian model averaged estimates for effects of QTL or effects of allelic substitution for markers which may be linked to QTL. Posterior probabilities are estimated from the BIC criterion combined with prior information, with adjustments for multiple imputation and selective genotyping. The posterior probability for model M_i is given by:

Pr(M_i) \propto exp(-BIC_i/2) \times π(M_i)

where BIC_i is the value of the BIC criterion and π(M_i) is the prior probability for M_i.

Missing marker values can be estimated by multiple imputation, conditional on flanking markers, using impute.markers, and the imputed data used as x ,y.

For selectively genotyped populations (Darvasi and Soller 1992) an adjustment is made to the BIC criterion. Asymptotic convergence is good for fully genotyped families with n >= 100 progeny but requires larger sample sizes for smaller proportions of the tails (p.sg) genotyped.

Value

bicreg.qtl returns an object of class bicreg.qtl
The function summary is used to print a summary of the results.
An object of class bicreg.qtl inherits from class bicreg and is a list containing at least the following components/attributes:

postprob the posterior probabilities of the models selected
namesx the names of the variables
label labels identifying the models selected
r2 R2 values for the models
bic values of BIC for the models
size the number of independent variables in each of the models
which a logical matrix with one row per model and one column per variable indicating whether that variable is in the model
probne0 the posterior probability that each variable is non-zero (in percent)
n the sample size before multiple imputation
postprob.size the marginal posterior probabilities for model sizes
postmean the posterior mean of each coefficient (from model averaging)
postsd the posterior standard deviation of each coefficient (from model averaging)
condpostmean the posterior mean of each coefficient conditional on the variable being included in the model
condpostsd the posterior standard deviation of each coefficient conditional on the variable being included in the model
ols matrix with one row per model and one column per variable giving the OLS estimate of each coefficient for each model
se matrix with one row per model and one column per variable giving the standard error of each coefficient for each model
reduced a logical indicating whether any variables were dropped before model averaging
dropped a vector containing the names of those variables dropped before model averaging
call the matched call that created the bicreg object
intercept if an intercept term was added
num.imputations the number of multiple imputations assumed
p.sg the proportion genotyped (p.sg/2 per tail)
delta the value of delta used

Author(s)

R.D. Ball, (rod.ball@AT@scionresearch.com), based on bicreg from the BMA package by Adrian Raftery, Chris Volinski, and Ian Painter.

References

Ball, R. D. 2001: Bayesian methods for QTL mapping based on model selection: approximate analysis using the Bayesian Information Criterion. Genetics 159: 1351–1364.

Ball, R.D. 2007: Quantifying evidence for candidate gene polymorphisms—Bayesian analysis combining sequence-specific and QTL co-location information. Genetics 177: 2399–2416.

DeSilva, H.N., and Ball, R.D. 2007: Linkage disequilibrium mapping concepts. Chapter 7, pp103–132 In: Association Mapping in Plants, N.C. Oraguzie, E.H.A. Rikkerink, S.E. Gardiner, and H.N. DeSilva (Editors), Springer, New York.

Ball, R.D. 2007: Statistical analysis and experimental design. Chapter 8, pp133–196 In: Association Mapping in Plants, N.C. Oraguzie, E.H.A. Rikkerink, S.E. Gardiner, and H.N. DeSilva (Editors), Springer, New York.

Bogdan, M., Ghosh, J. K., and Doerge, R. W. 2004: Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics 167: 989–999.

Broman, K.W. and Speed, T.P. 2002: A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). J Roy Stat Soc B 64: 641–656, 731–775.

Darvasi, A. and Soller, M. 1992: Selective genotyping for determination of linkage between a locus and a quantitative trait locus. Theoretical and Applied Genetics 85: 353–359.

Raftery, A. E. 1995: Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.

See Also

bicreg, summary.bicreg.qtl, DS.gamma, recalc.bicprobs impute.marker

Examples

# simulated backcross progeny 
set.seed(1234)
ex1.marker.pos <- seq(5,105,by=10)
chrom <- rep(1:2,rep(length(ex1.marker.pos),2))
ex1.qtldata <- sim.bc.progeny(n=1200,Vp=c(0.1,0.2,0.3,0.15,0.25)/2,
           map.pos=list(chrom=rep(1:2,rep(length(ex1.marker.pos),2)),
           pos=rep(ex1.marker.pos,2)),qtl.pos=list(chrom=rep(1:2,c(3,2)),
                                     pos=c(40,50,80,30,55)))
ex1n1200c1.bicreg <- bicreg.qtl(x=ex1.qtldata$x[,chrom==1],y=ex1.qtldata$y,OR=1000,
                                nbest=10,nvmax=5,prior=0.2,keep.size=1)
# 23 models account for 99% of the probability
cumsum(ex1n1200c1.bicreg$postprob)
# 2 QTL in coupling at 40,50cM can't be resolved
summary(ex1n1200c1.bicreg,nbest=23)

[Package BayesQTLBIC version 1.0-0 Index]