msc.peaks.align {caMassClass}R Documentation

Align Peaks of Mass Spectra into a "Biomarker" Matrix

Description

Align peaks from multiple protein mass spectra (SELDI) samples into a single "biomarker" matrix

Usage

  msc.peaks.align(Peaks, SampFrac=0.3, BinSize=c(0.002, 0.008), ...)
  msc.peaks.alignment(S,  M,  H, Tag=0, SampFrac=0.3, BinSize=c(0.002, 0.008), ...)

Arguments

Peaks Peak information. Could have two formats: a filename where to find the data, or the data itself. In the first case, Peaks is string containing path to a file saved by msc.peaks.find, getPeaks (from PROcess package), or by other software. In the second case, it is a data-frame in the same format as returned by msc.peaks.find. A third way to pass the same input data is through use of S, M, H and Tag variables (described below) used by msc.peaks.alignment function.
S Peak sample number. Unique number of the sample the peak belongs to. Likely to come from Peaks$Spectrum. .
M Peak center mass. Position of the peak on the x-axis. Likely to come from Peaks$Substance.Mass.
H Peak height. Likely to come from Peaks$Intensity.
Tag Peak sample name. Unique name of the sample the peak belongs to. Likely to come from Peaks$Spectrum.Tag. Optional since is used only to set column-names of output data.
SampFrac After peak alignment, bins with fewer peaks than SampFrac*nSamp are removed.
BinSize Upper and lower bound of bin-sizes, based on expected experimental variation in the mass (m/z) values. Size of any bin is measured as (R-L)/mean(R,L) where L and R are masses (m/z values) of left and right boundaries. All resulting bin sizes will all be between BinSize[1] and BinSize[2]. Since SELDI data is often assumed to have

+-

3% mass drift than a good bin size is twice that number (0.006). Same as BinSize variable in msc.peaks.clust, except for default.

... Two additional parameters that can be passed to msc.peaks.clust are mostly for expert users fine-tuning the code:
  • tol - gaps bigger than tol*max(gap) are assumed to be the same size as the largest gap. See details.
  • verbose - boolean flag turns debugging printouts on.

Details

Two interfaces were provided to the same function:

This function aligns peaks from different samples into bins in such a way as to satisfy constraints in following order:

The algorithm used does the following:

The algorithm for peak alignment is described as recursive algorithm but the actual implementation uses internal stack, instead in order to increase speed.

Value

Bmrks Biomarker matrix containing one sample per column and one biomarker per row. If a given sample does not have a peak in some bin than NA is inserted.
BinBounds Mass of left-most and right-most peak in the bin

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

References

The initial version of this function started as implementation of algorithm described on webpage of Virginia Prostate Center (at Virginia Medical School) documenting their PeakMiner Software. See http://www.evms.edu/vpc/seldi/peakminer.pdf

See Also

Examples

  # load "Data_IMAC.Rdata" file containing raw MS spectra 'X'  
  if (!file.exists("Data_IMAC.Rdata")) example("msc.project.read")
  load("Data_IMAC.Rdata")

  # Find and Align peaks
  Peaks = msc.peaks.find(X)
  cat(nrow(Peaks), "peaks were found in", Peaks[nrow(Peaks),2], "files.\n")
  Y = msc.peaks.align(Peaks)
  print( t(Y$Bmrks) , na.print=".",  digits=2)
  stopifnot( dim(Y$Bmrks)==c(22, 40) )

[Package caMassClass version 1.6 Index]