run.cluster.matrix {FTICRMS} | R Documentation |
Takes the file generated by run.lrg.peaks
, identifies equivalent peaks in each spectrum,
and fills in missing values.
run.cluster.matrix(pre.align = FALSE, align.method = "spline", trans.method = "shiftedlog", add.par = 0, subtract.base = FALSE, lrg.only = TRUE, calc.all.peaks = FALSE, masses = NULL, isotope.dist = 7, cluster.method = "ppm", cluster.constant = 10, num.pts = 5, R2.thresh = 0.98, oneside.min = 1, peak.method = "parabola", root.dir = ".", base.dir, peak.dir, lrg.dir, lrg.file = lrg_peaks.RData, overwrite = FALSE, use.par.file = FALSE, par.file = "parameters.RData")
pre.align |
either FALSE , or a numeric vector of shifts to apply to spectra, or a two-component list (of the form described in the Note section below) to be used before identifying peaks from different spectra |
align.method |
alignment algorithm for peaks |
trans.method |
type of transformation to use on spectra before statistical analysis; currently, only "shiftedlog" , "glog" , and "none" are supported |
add.par |
additive parameter for "shiftedlog" or "glog" options for trans.method |
subtract.base |
logical; whether to subtract calculated baseline from spectrum |
lrg.only |
logical; whether to consider only peaks that have at least one “large”peak; i.e., identified by run.lrg.peaks |
calc.all.peaks |
logical; whether to calculate all possible peaks or only sufficiently large ones |
masses |
specific masses to test |
isotope.dist |
maximum distance for declaring isotopes |
cluster.method |
NA |
cluster.constant |
NA |
num.pts |
number of consecutive points needed for peak fitting |
R2.thresh |
R^2 value needed for peak fitting |
oneside.min |
minimum number of points on each side of local maximum for peak fitting |
peak.method |
method for locating peaks |
root.dir |
directory for parameters file and raw data |
base.dir |
directory for baseline files; default is paste(root.dir, "/Baselines", sep = "") |
peak.dir |
directory for peak location files; default is paste(root.dir, "/All_Peaks", sep = "") |
lrg.dir |
directory for large peaks file; default is paste(root.dir, "/Large_Peaks", sep = "") |
lrg.file |
name of file to store large peaks in |
overwrite |
whether to replace exisiting files with new ones |
use.par.file |
logical; if TRUE , then parameters are read from par.file in directory root.dir |
par.file |
string containing name of parameters file |
Reads in information from file created by run.strong.peaks
, calculates the cluster matrix,
fills in missing values, and overwrites the file named lrg.file
in lrg.dir
.
The resulting file contains variables
amps | data frame of amplitudes created by run.strong.peaks |
centers | data frame of centers created by run.strong.peaks |
clust.mat | data frame with columns given by samples and rows given by the distinct peaks in the samples |
num.sig | vector of the number of peaks in each row of clust.mat which were not missing |
lrg.peaks | the data frame of significant peaks created by run.lrg.peaks |
and is ready to be used by run.strong.peaks
.
No value returned; the file is simply created.
If use.par.file = TRUE
, then the parameters read in from the file overwrite any arguments entered in the
function call.
pre.align
is used if the spectra have not already been aligned by the mass spectroscopists.
If it is not FALSE
, it can either be a vector of additive shifts to be applied to the
spectra, or a list with components targets
and actual
. In the last case, targets
is a vector of target masses, and actual
is a matrix with length(targets)
columns and a row for each spectrum, actual[i,j]
being the mass in spectrum i
that should
be matched exactly to target[j]
, with NA
being a valid entry in actual
.
The matching is done (depending on the number of non-missing values in row i
) either with a
simple shift (one non-missing value), an affine transformation (two non-missing values), a
piecewise affine transformation (three non-missing values), or an interpolation spline (four
or more non-missing values).
Suppose cluster.constant = K
and we have two peaks in different spectra with
masses m[1] and m[2]. If cluster.method = "constant"
, then the peaks
are considered to be the same peak if we have m[2]-m[1] < K. If
cluster.method = "ppm"
, then the peaks are considered to be the same peak if we
have m_[2]-m_[1] < K * m[2] * 1e-6. If
cluster.method = "usewidth"
, then the algorithm uses the observation that
log(Width_hat)
and log(Center_hat)
appear to be linearly related. Tolerances are
then computed using this relationship.
Don Barkauskas (barkda@wald.ucdavis.edu)
Barkauskas, D.A. (2009) “Statistical Analysis of Matrix-Assisted Laser Desorption/Ionization Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Data with Applications to Cancer Biomarker Detection”. Ph.D. dissertation, University of California at Davis.
Barkauskas, D.A. et al. (2009) “Detecting glycan cancer biomarkers in serum samples using MALDI FT-ICR mass spectrometry data”. Bioinformatics, 25:2, 251–257.
run.lrg.peaks
, run.strong.peaks
, interpSpline