BCE {BCE}R Documentation

Bayesian Composition Estimator

Description

estimates probability distributions of a sample composition based on

  • an input ratio matrix, Rat, containing biomarker ratios in (field) samples, and
  • an input data matrix, Dat, containing the biomarker ratios for several taxonomic groups

    Usage

    BCE(Rat, Dat, relsdRat = 0, abssdRat = 0, minRat = 0, 
    maxRat = +Inf, relsdDat = 0, abssdDat = 0, tol = 1e-4, tolX = 1e-4, 
    positive = 1:ncol(Rat), iter = 100, outputlength = 1000, 
    burninlength = 0, jmpRat = 0.01, jmpX = 0.01, unif = FALSE, 
    verbose = TRUE, initRat = Rat, initX = NULL, userProb = NULL, 
    confInt = 2/3, export = FALSE, file = "BCE")

    Arguments

    Rat initial ratio matrix. Each row of Rat contains the biomarker composition of one taxon. As a result of the Bayesian procedure, this initial ratio matrix will be altered
    Dat initial data matrix. Each row of Dat contains the biomarker composition of one (field) sample.
    relsdRat relative standard deviation on ratio matrix. Either one number or a matrix with the same dimensions as Rat
    abssdRat absolute standard deviation on ratio matrix. Either one number or a matrix with the same dimensions as Rat
    minRat minimum values of ratio matrix. Either one number or a matrix with the same dimensions as Rat
    maxRat maximum values of ratio matrix. Either one number or a matrix with the same dimensions as Rat
    relsdDat relative standard deviation on data matrix. Either one number or a matrix with the same dimensions as Dat
    abssdDat absolute standard deviation on data matrix. Either one number or a matrix with the same dimensions as Dat
    tol minimum standard deviation for data matrix Dat. One value
    tolX minimum x values. Used for MCMC initiation. One value
    positive A vector containing numbers of columns that should contain strictly positive data. Only these columns are rescaled. The other columns (not in positive) are not rescaled, and can become negative
    iter number of iterations for MCMC
    outputlength number of iterations kept in the output
    burninlength number of initial iterations to be removed from output
    jmpRat jump length of the ratio matrix Rat (in normal space). Either a number, a vector with length equal to the number of biomarkers (number of columns in Rat), or a or matrix with the same dimensions as the ratio matrix rat
    jmpX jump length of the composition matrix (in a simplex). Either one number, a vector of length equal to the number of taxa (number of rows in Rat) or a matrix with the same dimensions = c(number of taxa, number of field samples)
    unif logical; if TRUE a uniform distribution for ratio matrix is used. This is similar as in chemtax
    verbose logical; if TRUE, extra information is provided during the run of the function, such as extra warnings, elapsed time and expected time until the end of the MCMC
    initRat ratio matrix used to start the markov chain: defaults to the initial ratio matrix
    initX composition matrix used to start the markov chain: default the LSEI solution of Ax=B
    userProb posterior probability for a given ratio matrix and composition matrix: should be a function with 2 arguments RAT and X, and as returned value a number giving the -log posterior probability of ratio matrix RAT and composition matrix X. Dependence of the probability on the data should be incorporated in the function. If not specified, the default probability distribution is the gamma function
    confInt confidence interval in output; because the distributions may not be symmetrical, standard deviations are not always a useful measure; instead, upper and lower boundaries of the given confidence interval are given. Default is 2/3, i.e there is a probability of 0.66 for a value to be contained within the interval.
    export logical; if TRUE, the function export.bce is called and a list of variables and plots are exported to the specified file.
    file Only if export is TRUE. If not NULL, a character string specifying the file to which objects are saved.

    Details

    The function \BCE searches probability distributions for all elements of a taxonomical composition matrix X and a ratio matrix Rat for which:

    X%*%Rat simeq Dat

    It does this by returning iter samples for X and Rat, organized in three-dimensional arrays. The input data matrix Dat and ratio matrix Rat should be in the following formats, with the relative concentrations per biomarker organized in columns:

    data matrix:
    marker1 marker2 marker3 marker4
    sample1 0.14 0.005 0.35 0.033
    sample2 0.15 0.004 0.36 0.034
    sample3 0.13 0.004 0.31 0.030
    sample4 0.13 0.005 0.33 0.031
    sample5 0.14 0.008 0.33 0.036
    sample6 0.11 0.082 0.34 0.044

    and ratio matrix:
    marker1 marker2 marker3 marker4
    species1 0.27 0.13 0.35 0.076
    species2 0.084 0 0.5 0.24
    species3 0.195 0.3 0 0.1
    species4 0.06 0 0 0
    species5 0 0 0 0
    species6 0 0 0 0

    Value

    A bce (bayesian compositional estimator) object; a list containing 4 elements

    Rat Array with dimension c(nrow(Rat),ncol(Rat),iter) containing the random walk values of the ratio matrix Rat
    X Array with dimension c(nrow(X),ncol(X),iter) containing the random walk values of the composition matrix X
    logp vector with length iter containing the random walk values of the (log) posterior probability
    naccepted integer indicating the number of runs that were accepted

    Note

    Producing sensible output:

    Markov Chain Monte Carlo simulations are not as straightforward as one might wish; several preliminary runs might be necessary to determine the desired number of iterations, burn-in length and jump length. For all estimated values of Rat and X, their trace (evolution of the values over all iterations) has to display random behaviour; no obvious trends should appear. A few parameters can be tuned to obtain such behaviour:

    Author(s)

    Karel Van den Meersche <k.vdmeersche@nioo.knaw.nl>, Karline Soetaert <k.soetaert@nioo.knaw.nl>

    See Also

    summary.bce, plot.bce, export.bce, pairs.bce

    Examples

    ##====================================
    
    # example using bceInput data
    # first try
    
    X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
             iter=1000,outputlength=5000,jmpX=.01,jmpRat=.01)
    
    ## the number of accepted runs is too low;
    ## we play around with the jump lengths jmpx and jmprat
    
    X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
             iter=1000,outputlength=5000,jmpX=.02,jmpRat=.002)
    
    ## we inspect the output:
    plot(X)
    
    ## For every element of X and Rat, we want to obtain a well-mixed,
    ## random trace. In this case, mixing is still a little poor.
    ## to optimize mixing in the ratio matrix, it is a good idea
    ## to make the jump length linear to the ratio matrix
    ## standard deviation (sdrat=.2*rat) :
    X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
             iter=1000,outputlength=5000,jmpX=.02,
             jmpRat=.2*(.2*bceInput$Rat))
    plot(X)
    
    ## mixing improved a lot; we repeat the run with more iterations
    ## to improve the reliability of the results.
    ## the following run can take a few minutes - so it is toggled off
    #X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
    #         iter=100000,outputlength=5000,jmpX=.02,
    #         jmpRat=.2*(.2*bceInput$Rat))
    #plot(X)
    ## you can see in the plots that traces for all elements of Rat and X
    ## are well-mixed. This run was saved in "bceOutput"
    
    Sum <-summary(bceOutput)
    
    # show results as mean with ranges
    print(Sum$meanX)
    
    # plot estimated means and ranges (lbX=lower, ubX=upper bound)
    xlim <- range(c(Sum$lbX,Sum$ubX))
    
    # first the mean
    dotchart(x=t(Sum$meanX),xlim=xlim,                                                          
             main="Taxonomic composition",
             sub="using bce",pch=16)
    
    # then ranges
    nr <- nrow(Sum$meanX)
    nc <- ncol(Sum$meanX)
    
    for (i in 1:nr) 
    {ip <-(nr-i)*(nc+2)+1
     cc <- ip : (ip+nc-1)
     segments(t(Sum$lbX[i,]),cc,t(Sum$ubX[i,]),cc)
     }
    
    # show results as pairs plot
    pairs(bceOutput,sample=3,main="Station 3")
    
    

    [Package BCE version 1.1 Index]