msc.peaks.clust {caMassClass}R Documentation

Clusters Peaks of Mass Spectra

Description

Clusters peaks from multiple protein mass spectra (SELDI) samples

Usage

  msc.peaks.clust(dM, S, BinSize=c(0,sum(dM)), tol=0.97, verbose=FALSE) 

Arguments

S Peak sample number, used to identify the spectrum the peak come from.
dM Distance between sorted peak positions (masses, m/z).
BinSize Upper and lower bound of bin-sizes, based on expected experimental variation in the mass (m/z) values. Size of any bin is measured as (R-L)/mean(R,L) where L and R are masses (m/z values) of left and right boundaries. All resulting bin sizes will be between BinSize[1] and BinSize[2]. Default is c(0,sum(dM)) which ensures that no BinSizes is not being used.
tol gaps bigger than tol*max(gap) are assumed to be the same size as the largest gap. See details.
verbose boolean flag turns debugging printouts on.

Details

This is a low level function used by msc.peaks.alignment and not intended to be directly used by many users. However it might be useful for other code developers. It clusters peaks from different samples into bins in such a way as to satisfy constraints in following order:

Value

The output is binary array of the same size as dM and S where left boundaries of each clusters-bin (biomarker) are marked

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

References

The initial version of this function started as implementation of algorithm described on webpage of Virginia Prostate Center (at Virginia Medical School) documenting their PeakMiner Software. See http://www.evms.edu/vpc/seldi/peakminer.pdf

See Also

Examples

  # example with simple made up data (18 peaks, 3 samples)
  M = c(1,5,8,12,17,22, 3,5,7,11,14,25, 1, 5, 7,10,17,21) # peak position/mass
  S = rep(1:3, each=6)               # peak's sample number
   idx = sort(M, index=TRUE)$ix      # sort peaks by mass
  M   = M[idx]                       # sorted mass
  S   = S[idx]                       # arrange sample numbers in the same order
  bin = msc.peaks.clust(diff(M), S, verbose=TRUE) 
  rbind(S,M,bin)                     # show results
  
  # use the results to align peaks into biomarkers matrix
  Bmrks = matrix(NA,sum(bin),max(S)) # init feature (biomarker) matrix
  bin   = cumsum(bin)                # find bin numbers for each peak in S array
  for (j in 1:length(S))             # Bmrks usually store height H of each peak
    Bmrks[bin[j], S[j]] = M[j];      # but in this example it will be mass
  Bmrks
  stopifnot( dim(Bmrks)==c(7,3) )
  stopifnot( sum(is.na(Bmrks[5,]))==2 )

[Package caMassClass version 1.6 Index]