msc.peaks.align {caMassClass} | R Documentation |
Align peaks from multiple protein mass spectra (SELDI) samples into a single "biomarker" matrix
msc.peaks.align(Peaks, SampFrac=0.3, BinSize=c(0.002, 0.008), ...) msc.peaks.alignment(S, M, H, Tag=0, SampFrac=0.3, BinSize=c(0.002, 0.008), ...)
Peaks |
Peak information. Could have two formats: a filename where to
find the data, or the data itself. In the first case, Peaks is string
containing path to a file saved by msc.peaks.find ,
getPeaks (from PROcess package), or by
other software. In the second case, it is a data-frame in the same
format as returned by
msc.peaks.find . A third way to pass the same input data is through
use of S, M, H and Tag variables (described below) used by
msc.peaks.alignment function. |
S |
Peak sample number. Unique number of the sample the peak belongs to.
Likely to come from Peaks$Spectrum. . |
M |
Peak center mass. Position of the peak on the x-axis.
Likely to come from Peaks$Substance.Mass . |
H |
Peak height. Likely to come from Peaks$Intensity . |
Tag |
Peak sample name. Unique name of the sample the peak belongs to.
Likely to come from Peaks$Spectrum.Tag .
Optional since is used only to set column-names of output data. |
SampFrac |
After peak alignment, bins with fewer peaks than
SampFrac*nSamp are removed. |
BinSize |
Upper and lower bound of bin-sizes, based on expected
experimental variation in the mass (m/z) values. Size of any bin is
measured as (R-L)/mean(R,L) where L and R are masses (m/z values) of
left and right boundaries. All resulting bin sizes will all be between
BinSize[1] and BinSize[2] . Since SELDI data is often assumed
to have +- 3% mass drift than a good bin size is twice that
number (0.006).
Same as |
... |
Two additional parameters that can be passed to
msc.peaks.clust are mostly for expert users fine-tuning the code:
|
Two interfaces were provided to the same function:
msc.peaks.alignment
is a lower level function with more detailed
inputs and outputs. Possibly easier to customize for other purposes than
processing SELDI data.
msc.peaks.align
is a higher level function with simpler interface
customized for processing SELDI data.
This function aligns peaks from different samples into bins in such a way as to satisfy constraints in following order:
BinSize[1]
and BinSize[2]
The algorithm used does the following:
tol
tolerance from it) than minimizes number of
multiple peaks from the same sample after cut
SampFrac*nSamp
The algorithm for peak alignment is described as recursive algorithm but the actual implementation uses internal stack, instead in order to increase speed.
Bmrks |
Biomarker matrix containing one sample per column and one
biomarker per row. If a given sample does not have a peak in some bin than
NA is inserted. |
BinBounds |
Mass of left-most and right-most peak in the bin |
Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com
The initial version of this function started as implementation of algorithm described on webpage of Virginia Prostate Center (at Virginia Medical School) documenting their PeakMiner Software. See http://www.evms.edu/vpc/seldi/peakminer.pdf
msc.peaks.find
,
getPeaks
(from PROcess package), or Ciphergen's
software
msc.biomarkers.fill
or
msc.copies.merge
msc.preprocess.run
pipeline
msc.peaks.clust
function to do most of the work
msc.peaks.read.csv
function to read peak file
msc.biomarkers.write.csv
function to save results
pk2bmkr
from PROcess package
performs similar function.
# load "Data_IMAC.Rdata" file containing raw MS spectra 'X' if (!file.exists("Data_IMAC.Rdata")) example("msc.project.read") load("Data_IMAC.Rdata") # Find and Align peaks Peaks = msc.peaks.find(X) cat(nrow(Peaks), "peaks were found in", Peaks[nrow(Peaks),2], "files.\n") Y = msc.peaks.align(Peaks) print( t(Y$Bmrks) , na.print=".", digits=2) stopifnot( dim(Y$Bmrks)==c(22, 40) )