msc.copies.merge {caMassClass}R Documentation

Merge Multiple Copies of Mass Spectra Samples

Description

Protein mass spectra (SELDI) samples are sometimes scanned multiple times in order to reduce hardware or software based errors. msc.copies.merge function is used to merge, concatenate, and/or average all of those copies together in preparation for classification.

Usage

msc.copies.merge( X, mergeType, PeaksOnly=TRUE) 

Arguments

X Spectrum data in 3D array format [nFeatures x nSamples x nCopies]. Row names (rownames(X)) store M/Z mass of each row. If X is in matrix format [nFeatures x nSamples] nothing will be done.
mergeType an integer variable in [0,11] range, telling how to merge samples and what to do with bad copies:
  • 0 - do nothing
  • add 1 - if all original copies are to be concatenated as separate samples
  • add 2 - if copies are to be averaged and the average added as a separate sample
  • add 4 - if for each sample the worst copy is to be deleted
  • add 8 - if for each sample in case of large differences between copies, a single bad copy of a sample is to be replaced with the best copy. Not to be used with previous option. See details.
PeaksOnly This variable is being passed to function msc.sample.correlation. Set it to TRUE in case of raw spectra and switch to FALSE in case of data where only peaks (biomarkers) are present.

Details

Quality of a sample is measured by calculating for each copy of each sample two variables: inner correlation (average correlation between multiple copies of the same sample) and outer correlation (average correlation between each sample and every other sample within the same copy). Inner correlation measures how similar copies are to each other and outer correlation measures how similar each copy is to everybody else. For example in case of experiment using SELDI technology to distinguish cancerous samples and non-cancerous samples one can assume that most of the proteins present in both cancerous and non-cancerous samples will be the same. In that case one will expect high correlation between samples and even higher correlation between copies of the same sample

if mergeType/4 (mergeType %/% 4) is

Option 2 is more suitable in case of data with a lot of copies, when we can afford dropping one copy. Option 1 is designed to patch the most serious problems with the data.

There are also four merging options, if mergeType mod 4 (mergeType %% 4) is

In preparation for classification one can use multiple copies in several ways: option 2 above improves (one hopes) accuracy of each sample, while options 1 and 3 increase number of samples available during classification. So the choice is: do we want a lot of samples during classification or fewer, better samples?

The best option of mergeType depends on kind of data.

Value

Return matrix containing features as rows and samples as columns, unless mergeType is 0,4, or 8 when no merging is done and data is returned in same or similar format as the input format.

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

See Also

Examples

  # load "Data_IMAC.Rdata" file containing raw MS spectra 'X'  
  if (!file.exists("Data_IMAC.Rdata")) example("msc.project.read")
  load("Data_IMAC.Rdata")
  
  # run msc.copies.merge
  Y = msc.copies.merge(X, 1+2+4)
  colnames(Y)
  stopifnot( dim(Y)==c(11883,60) )

[Package caMassClass version 1.6 Index]