run.cluster.matrix {FTICRMS}R Documentation

Identify Equivalent Peaks from Different Subjects

Description

Takes the file generated by run.lrg.peaks, identifies equivalent peaks in each spectrum, and fills in missing values.

Usage

run.cluster.matrix(pre.align = FALSE, align.method = "spline", 
                   trans.method = "shiftedlog", add.par = 10,
                   lrg.only = TRUE, masses = NULL, isotope.dist = 7,
                   cluster.method = "ppm", cluster.constant = 10, 
                   root.dir = ".", base.dir, peak.dir, lrg.dir, 
                   lrg.file = "lrg.peaks.RData", overwrite = FALSE,
                   use.par.file = FALSE, par.file = "parameters.RData")

Arguments

pre.align either FALSE, or a numeric vector of shifts to apply to spectra, or a two-component list (of the form described in the Note section below) to be used before identifying peaks from different spectra
align.method type of alignment to use on spectra before statistical analysis; currently, only "spline" and "none" are supported
trans.method type of transformation to use on spectra before statistical analysis; currently, only "shiftedlog", "glog", and "none" are supported
add.par additive parameter for "shiftedlog" or "glog" options for trans.method
lrg.only whether to consider only peaks that have at least one peak “significant”; i.e., identified by run.lrg.peaks
masses numeric vector of specific masses to test
isotope.dist maximum number of isotope peaks to look at (in addition to main peak)
cluster.method one of "ppm", "constant", or "usewidth"; see Notes below
cluster.constant parameter used in running cluster.method
root.dir string containing location of raw data directory
base.dir directory for baseline-corrected files; default is paste(root.dir, "/Baseline_Corrected", sep = "")
peak.dir directory for peak location files; default is paste(root.dir, "/All_Peaks", sep = "")
lrg.dir directory for significant peaks file; default is paste(root.dir, "/Large_Peaks", sep = "")
lrg.file string containing name of significant peaks file
overwrite logical; whether to replace existing files with new ones
use.par.file logical; if TRUE, then parameters are read from par.file in directory root.dir
par.file string containing name of parameters file

Details

Reads in information from file created by run.strong.peaks, calculates the cluster matrix, fills in missing values, and overwrites the file named lrg.file in lrg.dir. The resulting file contains variables
amps data frame of amplitudes created by run.strong.peaks
centers data frame of centers created by run.strong.peaks
clust.mat data frame with columns given by samples and rows given by the distinct peaks in the samples
num.sig vector of the number of peaks in each row of clust.mat which were not missing
lrg.peaks the data frame of significant peaks created by run.lrg.peaks

and is ready to be used by run.strong.peaks.

Value

No value returned; the file is simply created.

Note

If use.par.file = TRUE, then the parameters read in from the file overwrite any arguments entered in the function call.

pre.align is used if the spectra have not already been aligned by the mass spectroscopists. If it is not FALSE, it can either be a vector of additive shifts to be applied to the spectra, or a list with components targets and actual. In the last case, targets is a vector of target masses, and actual is a matrix with length(targets) columns and a row for each spectrum, actual[i,j] being the mass in spectrum i that should be matched exactly to target[j], with NA being a valid entry in actual. The matching is done (depending on the number of non-missing values in row i) either with a simple shift (one non-missing value), an affine transformation (two non-missing values), a piecewise affine transformation (three non-missing values), or an interpolation spline (four or more non-missing values).

Suppose cluster.constant = K and we have two peaks in different spectra with masses m_1 and m_2. If cluster.method = "constant", then the peaks are considered to be the same peak if we have m_2-m_1 < K. If cluster.method = "ppm", then the peaks are considered to be the same peak if we have m_2-m_1 < K*m_2/10^6. If cluster.method = "usewidth", then the algorithm uses the observation that log(Width_hat) and log(Center_hat) appear to be linearly related. Tolerances are then computed using this relationship.

Author(s)

Don Barkauskas (barkda@wald.ucdavis.edu)

References

Barkauskas, D.A. et al. (2008) “Detecting glycan cancer biomarkers in serum samples using MALDI FT-ICR mass spectrometry data”. Submitted to Bioinformatics

See Also

run.lrg.peaks, run.strong.peaks, interpSpline


[Package FTICRMS version 0.6 Index]