BCE {BCE} | R Documentation |
estimates probability distributions of a sample composition based on
Rat
, containing biomarker ratios in (field)
samples, and
Dat
, containing the biomarker
ratios for several taxonomic groups
BCE(Rat, Dat, relsdRat = 0, abssdRat = 0, minRat = 0, maxRat = +Inf, relsdDat = 0, abssdDat = 0, tol = 1e-4, tolX = 1e-4, positive = 1:ncol(Rat), iter = 100, outputlength = 1000, burninlength = 0, jmpRat = 0.01, jmpX = 0.01, unif = FALSE, verbose = TRUE, initRat = Rat, initX = NULL, userProb = NULL, confInt = 2/3, export = FALSE, file = "BCE")
Rat |
initial ratio matrix. Each row of Rat contains the biomarker composition
of one taxon. As a result of the Bayesian procedure, this initial ratio matrix will be altered |
Dat |
initial data matrix. Each row of Dat contains the biomarker composition of
one (field) sample. |
relsdRat |
relative standard deviation on ratio matrix. Either one number or a matrix with
the same dimensions as Rat |
abssdRat |
absolute standard deviation on ratio matrix. Either one number or a matrix with
the same dimensions as Rat |
minRat |
minimum values of ratio matrix. Either one number or a matrix with
the same dimensions as Rat |
maxRat |
maximum values of ratio matrix. Either one number or a matrix with
the same dimensions as Rat |
relsdDat |
relative standard deviation on data matrix. Either one number or a matrix with
the same dimensions as Dat |
abssdDat |
absolute standard deviation on data matrix. Either one number or a matrix with
the same dimensions as Dat |
tol |
minimum standard deviation for data matrix Dat . One value |
tolX |
minimum x values. Used for MCMC initiation. One value |
positive |
A vector containing numbers of columns that should
contain strictly positive data. Only these columns are rescaled. The
other columns (not in positive ) are not rescaled, and can become
negative |
iter |
number of iterations for MCMC |
outputlength |
number of iterations kept in the output |
burninlength |
number of initial iterations to be removed from output |
jmpRat |
jump length of the ratio matrix Rat (in normal
space). Either a number, a vector with length equal to the number of
biomarkers (number of columns in Rat ), or a or matrix with the
same dimensions as the ratio matrix rat |
jmpX |
jump length of the composition matrix (in a
simplex). Either one number, a vector of length equal to the number of
taxa (number of rows in Rat )
or a matrix with the same dimensions = c(number of taxa, number of
field samples) |
unif |
logical; if TRUE a uniform distribution for ratio matrix is used. This is similar as in chemtax |
verbose |
logical; if TRUE, extra information is provided during the run of the function, such as extra warnings, elapsed time and expected time until the end of the MCMC |
initRat |
ratio matrix used to start the markov chain: defaults to the initial ratio matrix |
initX |
composition matrix used to start the markov chain: default the LSEI solution of Ax=B |
userProb |
posterior probability for a given ratio matrix and composition matrix: should be a function with 2 arguments RAT and X, and as returned value a number giving the -log posterior probability of ratio matrix RAT and composition matrix X. Dependence of the probability on the data should be incorporated in the function. If not specified, the default probability distribution is the gamma function |
confInt |
confidence interval in output; because the distributions may not be symmetrical, standard deviations are not always a useful measure; instead, upper and lower boundaries of the given confidence interval are given. Default is 2/3, i.e there is a probability of 0.66 for a value to be contained within the interval. |
export |
logical; if TRUE , the function
export.bce is called and a list of variables and plots
are exported to the specified file . |
file |
Only if export is TRUE. If not NULL, a
character string specifying the file to which objects are saved. |
The function \BCE
searches probability distributions for all
elements of a taxonomical composition matrix X
and a ratio
matrix Rat
for which:
X%*%Rat simeq Dat
It does this by returning iter
samples for X and Rat, organized
in three-dimensional arrays. The input
data matrix Dat
and ratio matrix Rat
should be
in the following formats, with the relative concentrations per
biomarker organized in columns:
data matrix:
marker1 | marker2 | marker3 | marker4 | |
sample1 | 0.14 | 0.005 | 0.35 | 0.033 |
sample2 | 0.15 | 0.004 | 0.36 | 0.034 |
sample3 | 0.13 | 0.004 | 0.31 | 0.030 |
sample4 | 0.13 | 0.005 | 0.33 | 0.031 |
sample5 | 0.14 | 0.008 | 0.33 | 0.036 |
sample6 | 0.11 | 0.082 | 0.34 | 0.044 |
and ratio matrix:
marker1 | marker2 | marker3 | marker4 | |
species1 | 0.27 | 0.13 | 0.35 | 0.076 |
species2 | 0.084 | 0 | 0.5 | 0.24 |
species3 | 0.195 | 0.3 | 0 | 0.1 |
species4 | 0.06 | 0 | 0 | 0 |
species5 | 0 | 0 | 0 | 0 |
species6 | 0 | 0 | 0 | 0 |
A bce (bayesian compositional estimator) object; a list containing 4 elements
Rat |
Array with dimension c(nrow(Rat ),ncol(Rat ),iter ) containing the random walk values of the ratio matrix Rat |
X |
Array with dimension c(nrow(X ),ncol(X ),iter ) containing the random walk values of the composition matrix X |
logp |
vector with length iter containing the random
walk values of the (log) posterior probability |
naccepted |
integer indicating the number of runs that were accepted |
Producing sensible output:
Markov Chain Monte Carlo simulations are not as straightforward as one might wish; several preliminary runs might be necessary to determine the desired number of iterations, burn-in length and jump length. For all estimated values of Rat and X, their trace (evolution of the values over all iterations) has to display random behaviour; no obvious trends should appear. A few parameters can be tuned to obtain such behaviour:
MCMC
, you
can find the number of accepted points under
MCMC$naccepted
. It is also given if you run the model with
verbose=TRUE
(default). This value should be somewhere
between 5% and 40%. For long runs, 5 % can be acceptable, for
short runs, you will prefer a higher acceptance in order to have
enough different points. 20% accepted is usually a good number. Do
some preliminary runs with iter=1000-10000
and tune the
jump length parameters jmpRat
and jmpX
. You can set
different jump lengths for each column of
the ratio matrix, or 1 jump length for the whole ratio matrix, and 1
jump length for the composition matrix. Decreasing the jump lengths
will generally increase the acceptance rate and vice versa. Also the
mixing rate (the speed with which accepted points change their
values) will be influenced. You want this mixing rate to be as high
as possible, whilst maintaining enough accepted points. burninlength
' cycles will not be written to the output. Look
at some plots to determine if you need to specify a burnin length.Karel Van den Meersche <k.vdmeersche@nioo.knaw.nl>, Karline Soetaert <k.soetaert@nioo.knaw.nl>
summary.bce
, plot.bce
,
export.bce
, pairs.bce
##==================================== # example using bceInput data # first try X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2, iter=1000,outputlength=5000,jmpX=.01,jmpRat=.01) ## the number of accepted runs is too low; ## we play around with the jump lengths jmpx and jmprat X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2, iter=1000,outputlength=5000,jmpX=.02,jmpRat=.002) ## we inspect the output: plot(X) ## For every element of X and Rat, we want to obtain a well-mixed, ## random trace. In this case, mixing is still a little poor. ## to optimize mixing in the ratio matrix, it is a good idea ## to make the jump length linear to the ratio matrix ## standard deviation (sdrat=.2*rat) : X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2, iter=1000,outputlength=5000,jmpX=.02, jmpRat=.2*(.2*bceInput$Rat)) plot(X) ## mixing improved a lot; we repeat the run with more iterations ## to improve the reliability of the results. ## the following run can take a few minutes - so it is toggled off #X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2, # iter=100000,outputlength=5000,jmpX=.02, # jmpRat=.2*(.2*bceInput$Rat)) #plot(X) ## you can see in the plots that traces for all elements of Rat and X ## are well-mixed. This run was saved in "bceOutput" Sum <-summary(bceOutput) # show results as mean with ranges print(Sum$meanX) # plot estimated means and ranges (lbX=lower, ubX=upper bound) xlim <- range(c(Sum$lbX,Sum$ubX)) # first the mean dotchart(x=t(Sum$meanX),xlim=xlim, main="Taxonomic composition", sub="using bce",pch=16) # then ranges nr <- nrow(Sum$meanX) nc <- ncol(Sum$meanX) for (i in 1:nr) {ip <-(nr-i)*(nc+2)+1 cc <- ip : (ip+nc-1) segments(t(Sum$lbX[i,]),cc,t(Sum$ubX[i,]),cc) } # show results as pairs plot pairs(bceOutput,sample=3,main="Station 3")