msc.peaks.find {caMassClass} | R Documentation |
Find Peaks in a Batch of Protein Mass Spectra (SELDI) Data.
msc.peaks.find(X, SNR=2, span=c(81,11), zerothresh=0.9)
X |
Spectrum data either in matrix format [nFeatures x nSamples] or in
3D array format [nFeatures x nSamples x nCopies]. Row names
(rownames(X)) store M/Z mass of each row. |
SNR |
signal to noise ratio (z-score) criterion for peak detection.
Similar to SoN variable in isPeak from PROcess
package. |
span |
two moving window widths. Smaller one will be used for smoothing
and local maxima finding. Larger one will be used for local variance
estimation. Similar to span and sm.span variables in
isPeak from PROcess package. |
zerothresh |
Intensity threshold criterion for peak detection. Positive
numbers in range [0,1),
like default 0.9, will be used to calculate a single threshold used
for all samples using quantile(X,zerothresh) equation. Negative
numbers in range (-1, 0) will be used to calculate threshold for each single
sample i using quantile(X[i,],-zerothresh) .
Similar to zerothrsh variable in
isPeak from PROcess package. |
Peak finding is done using the following algorithm:
x | = | X[j,] |
thresh | = | if(zerothresh>=0) quantile(X,zerothresh) else quantile(x,-zerothresh) |
sig | = | runmean(x, span[2]) |
rMax | = | runmax (x, span[2]) |
rAvr | = | runmed (x, span[1]) |
rStd | = | runmad (x, span[1], center=rAvr) |
peak | = | (rMax == x) & (sig > thresh) & (sig-rAvr > SNR*rStd) |
What means that a peak have to meet the following criteria to be classified as a peak:
span[2]
neighborhood
sig
) is above user defined threshold
zerothresh
It is very similar to the isPeak
and getPeaks
functions from
PROcess library (ver 1.3.2) written by Xiaochun Li. For example
getPeaks(X, PeakFile, SoN=SNR, span=span[1], sm.span=span[2],
zerothrsh=zerothresh, area.w=0.003, ratio=0)
would give very similar
results as msc.peaks.find
the differences include: speed ( msc.peaks.find
uses much faster C-level code), different use of signal-to-noise-ratio
variable, and msc.peaks.find
does not do or use area calculations.
A data frame, in the same format as data saved in peakinfofile
, have
five components:
Spectrum.Tag |
sample name of each peak |
Spectrum. |
sample number of each peak |
Peak. |
peak number within each sample |
Intensity |
peak height (intensity) |
Substance.Mass |
x-axis position, or corresponding mass of
the peak measured in M/Z, which were extracted from row names of the
X matrix. |
Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com
msc.preprocess.run
and
msc.project.run
pipelines.
msc.mass.adjust
msc.peaks.align
or pk2bmkr
can be used to align peaks from different samples in order to find
biomarkers.
msc.peaks.read.csv
and msc.peaks.write.csv
.
isPeak
and
getPeaks
from PROcess package are very similar.
runmax
, runmean
,
runmed
, runmad
functions.
# load "Data_IMAC.Rdata" file containing raw MS spectra 'X' if (!file.exists("Data_IMAC.Rdata")) example("msc.project.read") load("Data_IMAC.Rdata") Peaks = msc.peaks.find(X) # Find Peaks cat(nrow(Peaks), "peaks were found in", Peaks[nrow(Peaks),2], "files.\n") stopifnot( nrow(Peaks)==823 ) # work directly with data from the input files directory = system.file("Test", package = "caMassClass") X = msc.rawMS.read.csv(directory, "IMAC_normal_.*csv") Peaks = msc.peaks.find(X) # Find Peaks cat(nrow(Peaks), "peaks were found in", Peaks[nrow(Peaks),2], "files.\n") stopifnot( nrow(Peaks)==424 )