baseline {FTICRMS} | R Documentation |
Computes an estimated baseline curve for a spectrum by a method of Rocke and Xi generalized by Barkauskas and Rocke.
baseline(spect, init.bd, sm.par = 1e-11, sm.ord = 2, max.iter = 40, tol = 5e-8, sm.div = NA, sm.norm.by = c("baseline", "overestimate", "constant"), neg.div = NA, neg.norm.by = c("baseline", "overestimate", "constant"), rel.conv.crit = TRUE, zero.rm = TRUE, halve.search = FALSE)
spect |
vector containing the intensities of the spectrum |
init.bd |
initial value for baseline; default is flat baseline at median height |
sm.par |
smoothing parameter for baseline calculation |
sm.ord |
order of derivative to kill in baseline analysis |
max.iter |
convergence criterion in baseline calculation |
tol |
convergence criterion; see below |
sm.div |
smoothness divisor in baseline calculation |
sm.norm.by |
method for smoothness penalty in baseline analysis |
neg.div |
negativity divisor in baseline calculation |
neg.norm.by |
method for negativity penalty in baseline analysis |
rel.conv.crit |
logical; whether convergence criterion should be relatiev to current baseline estimate |
zero.rm |
logical; whether to replace zeros with average of surrounding values |
halve.search |
logical; whether to use a halving-line search if step leads to smaller value of function |
If the spectrum is given by y[i], then the algorithm works by maximizing the objective function
F({b[i]}) = sum_{i=1}^{n}b[i] - sum_{i=2}^{n-1}A[1,i]*(b[i-1]-2b[i]+b[i+1])^2 - sum_{i=1}^n A[2,i]*[max{b[i]-y[i],0}]^2
using Newton's method with embedded halving line search using starting value
b[i] = median(spect)
for all i. The middle term controls the
smoothness of the baseline and the last term applies a
“negativity penalty” when the baseline is above the spectrum.
The smoothing factor sm.par
corresponds to A[1]^{*} in Barkauskas (2009) and controls
how large the estimated nth derivative of the baseline is allowed to be (for sm.ord = n
).
From a practical standpoint, values of sm.ord
larger than two do not seem to adequately smooth the
baseline because the Hessian becomes computationally singular for any reasonable value of sm.par
.
The parameters sm.div
, sm.norm.by
, neg.div
, and neg.norm.by
determine the methods
used to normalize the smoothness and negativity terms. The general forms are
A[1,i] = n^4 * A[1]^{*}/M[i]/p and A[2,i] = 1/M[i]/p.
Here, n = length(spect)
; p is sm.div
or neg.div
, as appropriate; and
M[i] is determined by sm.norm.by
or neg.norm.by
, as appropriate. Values of
"baseline"
make M[i] = b[i]', where b[i]' is the currently estimated
value of the baseline; values of "overestimate"
make M[i] = b[i]'-y[i]; and
values of "constant"
make M[i] = σ, where σ is an estimate of the
noise standard deviation.
The values of sm.norm.by
and neg.norm.by
can be abbreviated by their first letters and both have
default value "baseline"
. The default values of NA
for sm.div
and neg.div
are
translated by default to sm.div = 0.5223145
and neg.div = 0.4210109
, which are the appropriate
parameters for the mass spectrometry machine that generated the spectra which were used to develop this package.
It is distinctly possible that other machines will require different parameters; see Barkauskas (2009) for a
description for how these parameters were obtained.
If zero.rm = TRUE
and y[a],...,y[b] = 0, then these values
of the spectrum are set to be (y[a]+y[b])/2. (For typical MALDI FT-ICR spectra,
a value of zero indicates an erased harmonic and should not be considered a real data point.)
A list containing the following items:
baseline |
The computed baseline |
iter |
The number of iterations for convergence |
changed |
Numeric vector of length iter containing the number of indicator variables that switched value on each iteration |
The original algorithm was developed by Yuanxin Xi and David Rocke. The code was originally adapted from a Matlab program by Yuanxin Xi, then modified to account for the new methodology in Barkauskas (2009).
Don Barkauskas (barkda@wald.ucdavis.edu)
Barkauskas, D.A. (2009) “Statistical Analysis of Matrix-Assisted Laser Desorption/Ionization Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Data with Applications to Cancer Biomarker Detection”. Ph.D. dissertation, University of California at Davis.
Barkauskas, D.A. et al. (2009) “Detecting glycan cancer biomarkers in serum samples using MALDI FT-ICR mass spectrometry data”. Bioinformatics, 25:2, 251–257.
Xi, Y. and Rocke, D.M. (2008) “Baseline Correction for NMR Spectroscopic Metabolomics Data Analysis”. BMC Bioinformatics, 9:324.