msc.preprocess.run {caMassClass}R Documentation

Preprocessing Pipeline of Protein Mass Spectra

Description

Pipeline for preprocessing protein mass spectra (SELDI) data before classification.

Usage

msc.preprocess.run ( X, mzXML=NULL,
    baseline.removal = 0,
      breaks=200, qntl=0, bw=0.005,                    # bslnoff
    min.mass = 3000,                                   # msc.mass.cut
    mass.drift.adjustment = 1,
      shiftPar=0.0005,                                 # msc.mass.adjust
    peak.extraction = 0, 
     PeakFile=0, SNR=2, span=c(81,11), zerothresh=0.9, # msc.peaks.find
     BmrkFile=0, BinSize=c(0.002, 0.008), tol=0.97,    # msc.peaks.align 
     FlBmFile=0, FillType=0.9,                         # msc.biomarkers.fill
    merge.copies = 1,                                  # msc.copies.merge
    verbose = TRUE) 

Arguments

X Spectrum data either in matrix format [nFeatures x nSamples] or in 3D array format [nFeatures x nSamples x nCopies]. Row names (rownames(X) store M/Z mass of each row.
mzXML optional record of experimental setup and processing so far, beeing prepared for possible output as mzXML file.
baseline.removal Remove baseline from each spectrum? (boolean or 0/1 integer). See function msc.baseline.subtract and bslnoff from PROcess library for other parameters that can be passed: breaks, qntl and bw.
min.mass Cutting place when removing data corresponding to low masses (m/z). See function msc.mass.cut for details.
mass.drift.adjustment Controls mass drift adjustment and scaling. If 0 than no mass adjustment or scaling will be performed; otherwise, it is passed to msc.mass.adjust function as scalePar. Because of that: 1 means that afterwards all samples will have the same mean, 2 means that afterwards all samples will have the same mean and medium. See function msc.mass.adjust for details and additional parameter shiftPar that can be passed.
peak.extraction Perform peak extraction and alignment, or keep on working with the raw spectra? (boolean or 0/1 integer). See following functions for other parameters that can be passed: Especially filenames to store intermediate results.
PeakFile Optional filename, storing peak finding results. If provided than CSV file will be created in the same format as Ciphergen's peak-info file, with following columns of data: "Spectrum.Tag", "Spectrum.", "Peak.", "Intensity" and "Substance.Mass".
BmrkFile Optional filename, storing peak alignment results. If provided than CSV file will be created in the same format as Ciphergen's biomarker file, with spectra (samples) as rows, and biomarkers as columns (features).
FlBmFile Optional filename, storing results of msc.biomarkers.fill. If provided than CSV file will be created in the same format as Ciphergen's biomarker file, with spectra (samples) as rows, and biomarkers as columns.
merge.copies In case multiple copies of data exist should they be merged and how? Passed to msc.copies.merge function as mergeType variable. See that function for more details.
verbose Boolean flag turns debugging printouts on.
breaks parameter to be passed to bslnoff function from PROcess library by msc.baseline.subtract
qntl parameter to be passed to bslnoff function from PROcess library by msc.baseline.subtract
bw parameter to be passed to bslnoff function from PROcess library by msc.baseline.subtract
shiftPar parameter to be passed to msc.mass.adjust
SNR parameter to be passed to msc.peaks.find
span parameter to be passed to msc.peaks.find
zerothresh parameter to be passed to msc.peaks.find
BinSize parameter to be passed to msc.peaks.align
tol parameter to be passed to msc.peaks.align
FillType parameter to be passed to msc.biomarkers.fill

Details

Function containing several pre-processing steps preparing protein mass spectra (SELDI) data for classification. This function is a "pipeline" performing several operations, all of which do not need class label information. Any and all steps are optional and can be skipped:

Value

Return matrix containing features as rows and samples as columns, unless merge.copies was 0,4, or 8 when no merging is done and data is returned in same or similar format as the input format [nFeatures x nSamples x nCopies]. Row names (rownames(X) store M/Z mass of each row. If mzXML input argument was not null than updated version of mzXML record will be outputted as "mzXML"attribute of X.

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

See Also

Examples

  # load "Data_IMAC.Rdata" file containing raw MS spectra 'X'  
  if (!file.exists("Data_IMAC.Rdata")) example("msc.project.read")
  load("Data_IMAC.Rdata") # load data: X & mzXML
  
  # run preprocess with peak extraction
  Y = msc.preprocess.run(X, mzXML=mzXML, peak.extraction=1, 
    PeakFile="peaks_IMAC.mzXML", BmrkFile="bmrk_IMAC.csv")
  cat("Size before: ", dim(X), " and after :", dim(Y), "\n")
  stopifnot( dim(Y)==c(25, 40) )        # make sure it is what's expected
  YmzXML = attr(Y, "mzXML")
  strsplit(YmzXML$dataProcessing, '\n') # show mzXML$dataProcessing record
  # inspect by hand output files: "peaks_IMAC.mzXML" & "bmrk_IMAC.csv"
  
  # run preprocess with no peak extraction
  Y = msc.preprocess.run(X, mzXML=mzXML)
  cat("Size before: ", dim(X), " and after :", dim(Y), "\n")
  stopifnot( dim(Y)==c(9377, 40) )      # make sure it is what's expected
  YmzXML = attr(Y, "mzXML")
  strsplit(YmzXML$dataProcessing, '\n') # show mzXML$dataProcessing record

[Package caMassClass version 1.6 Index]