msc.copies.merge {caMassClass} | R Documentation |
Protein mass spectra (SELDI) samples are sometimes scanned
multiple times in order to reduce hardware or software based errors.
msc.copies.merge
function is used to merge, concatenate, and/or average all
of those copies together in preparation for classification.
msc.copies.merge( X, mergeType, PeaksOnly=TRUE)
X |
Spectrum data in 3D array format [nFeatures x nSamples x nCopies].
Row names (rownames(X) ) store M/Z mass of each row.
If X is in matrix format [nFeatures x nSamples] nothing will be done. |
mergeType |
an integer variable in [0,11] range, telling how to merge
samples and what to do with bad copies:
|
PeaksOnly |
This variable is being passed to function
msc.sample.correlation . Set it to TRUE in case of raw
spectra and switch to FALSE in case of data where only peaks
(biomarkers) are present. |
Quality of a sample is measured by calculating for each copy of each sample two variables: inner correlation (average correlation between multiple copies of the same sample) and outer correlation (average correlation between each sample and every other sample within the same copy). Inner correlation measures how similar copies are to each other and outer correlation measures how similar each copy is to everybody else. For example in case of experiment using SELDI technology to distinguish cancerous samples and non-cancerous samples one can assume that most of the proteins present in both cancerous and non-cancerous samples will be the same. In that case one will expect high correlation between samples and even higher correlation between copies of the same sample
if mergeType/4
(mergeType %/% 4
) is
score=outer_correlation + inner_correlation
measure.
Delete worst copy.
Option 2 is more suitable in case of data with a lot of copies, when we can afford dropping one copy. Option 1 is designed to patch the most serious problems with the data.
There are also four merging options, if mergeType mod 4
(mergeType %% 4
) is
X = cbind(X[,,1], X[,,2], ..., X[,,nCopy])
so they seem as separate samples
X = (X[,,1] + X[,,2] + ... + X[,,nCopy])/nCopy)
X = cbind(X[,,1], X[,,2], ..., X[,,nCopy], Xavr)
In preparation for classification one can use multiple copies in several ways: option 2 above improves (one hopes) accuracy of each sample, while options 1 and 3 increase number of samples available during classification. So the choice is: do we want a lot of samples during classification or fewer, better samples?
The best option of mergeType
depends on kind of data.
Return matrix containing features as rows and samples as columns, unless
mergeType
is 0,4, or 8 when no merging is done and data is returned in
same or similar format as the input format.
Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com
msc.preprocess.run
and
msc.project.run
pipelines.
msc.mass.adjust
or
peak finding functions: msc.peaks.find
, msc.peaks.align
, and
msc.biomarkers.fill
msc.classifier.test
msc.sample.correlation
# load "Data_IMAC.Rdata" file containing raw MS spectra 'X' if (!file.exists("Data_IMAC.Rdata")) example("msc.project.read") load("Data_IMAC.Rdata") # run msc.copies.merge Y = msc.copies.merge(X, 1+2+4) colnames(Y) stopifnot( dim(Y)==c(11883,60) )