msc.preprocess.run {caMassClass} | R Documentation |
Pipeline for preprocessing protein mass spectra (SELDI) data before classification.
msc.preprocess.run ( X, mzXML=NULL, baseline.removal = 0, breaks=200, qntl=0, bw=0.005, # bslnoff min.mass = 3000, # msc.mass.cut mass.drift.adjustment = 1, shiftPar=0.0005, # msc.mass.adjust peak.extraction = 0, PeakFile=0, SNR=2, span=c(81,11), zerothresh=0.9, # msc.peaks.find BmrkFile=0, BinSize=c(0.002, 0.008), tol=0.97, # msc.peaks.align FlBmFile=0, FillType=0.9, # msc.biomarkers.fill merge.copies = 1, # msc.copies.merge verbose = TRUE)
X |
Spectrum data either in matrix format [nFeatures x nSamples] or in
3D array format [nFeatures x nSamples x nCopies]. Row names
(rownames(X) store M/Z mass of each row. |
mzXML |
optional record of experimental setup and processing so far, beeing prepared for possible output as mzXML file. |
baseline.removal |
Remove baseline from each spectrum? (boolean or 0/1
integer). See function msc.baseline.subtract and
bslnoff
from PROcess library for other parameters that can be passed:
breaks , qntl and bw . |
min.mass |
Cutting place when removing data corresponding to low masses
(m/z). See function msc.mass.cut for details. |
mass.drift.adjustment |
Controls mass drift adjustment and scaling.
If 0 than no mass adjustment or scaling will be performed; otherwise, it is
passed to msc.mass.adjust function as scalePar . Because
of that: 1 means that afterwards all samples will have the same mean, 2
means that afterwards all samples will have the same mean and medium. See
function msc.mass.adjust for details and additional parameter
shiftPar that can be passed. |
peak.extraction |
Perform peak extraction and alignment, or keep on
working with the raw spectra? (boolean or 0/1 integer). See following
functions for other parameters that can be passed:
|
PeakFile |
Optional filename, storing peak finding results. If provided than CSV file will be created in the same format as Ciphergen's peak-info file, with following columns of data: "Spectrum.Tag", "Spectrum.", "Peak.", "Intensity" and "Substance.Mass". |
BmrkFile |
Optional filename, storing peak alignment results. If provided than CSV file will be created in the same format as Ciphergen's biomarker file, with spectra (samples) as rows, and biomarkers as columns (features). |
FlBmFile |
Optional filename, storing results of
msc.biomarkers.fill . If provided than CSV file will be created
in the same format as Ciphergen's biomarker file, with spectra (samples)
as rows, and biomarkers as columns. |
merge.copies |
In case multiple copies of data exist should they be
merged and how? Passed to msc.copies.merge function as
mergeType variable. See that function for more details. |
verbose |
Boolean flag turns debugging printouts on. |
breaks |
parameter to be passed to bslnoff
function from PROcess library by msc.baseline.subtract |
qntl |
parameter to be passed to bslnoff
function from PROcess library by msc.baseline.subtract |
bw |
parameter to be passed to bslnoff
function from PROcess library by msc.baseline.subtract |
shiftPar |
parameter to be passed to msc.mass.adjust |
SNR |
parameter to be passed to msc.peaks.find |
span |
parameter to be passed to msc.peaks.find |
zerothresh |
parameter to be passed to msc.peaks.find |
BinSize |
parameter to be passed to msc.peaks.align |
tol |
parameter to be passed to msc.peaks.align |
FillType |
parameter to be passed to msc.biomarkers.fill |
Function containing several pre-processing steps preparing protein mass spectra (SELDI) data for classification. This function is a "pipeline" performing several operations, all of which do not need class label information. Any and all steps are optional and can be skipped:
msc.baseline.subtract
and bslnoff
from PROcess library.
msc.mass.cut
.
msc.mass.adjust
.
msc.peaks.find
, msc.peaks.align
and
msc.biomarkers.fill
.
msc.copies.merge
.
Return matrix containing features as rows and samples as columns, unless
merge.copies
was 0,4, or 8 when no merging is done and data is
returned in same or similar format as the input format
[nFeatures x nSamples x nCopies].
Row names (rownames(X)
store M/Z mass of each row.
If mzXML
input argument was not null than updated version of mzXML
record will be outputted as "mzXML
"attribute of X
.
Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com
msc.project.read
or
msc.rawMS.read.csv
functions
msc.baseline.subtract
, bslnoff
from PROcess library, msc.mass.cut
,
msc.mass.adjust
, msc.peaks.find
,
msc.peaks.align
, msc.biomarkers.fill
, and
msc.copies.merge
.
msc.classifier.test
function
# load "Data_IMAC.Rdata" file containing raw MS spectra 'X' if (!file.exists("Data_IMAC.Rdata")) example("msc.project.read") load("Data_IMAC.Rdata") # load data: X & mzXML # run preprocess with peak extraction Y = msc.preprocess.run(X, mzXML=mzXML, peak.extraction=1, PeakFile="peaks_IMAC.mzXML", BmrkFile="bmrk_IMAC.csv") cat("Size before: ", dim(X), " and after :", dim(Y), "\n") stopifnot( dim(Y)==c(25, 40) ) # make sure it is what's expected YmzXML = attr(Y, "mzXML") strsplit(YmzXML$dataProcessing, '\n') # show mzXML$dataProcessing record # inspect by hand output files: "peaks_IMAC.mzXML" & "bmrk_IMAC.csv" # run preprocess with no peak extraction Y = msc.preprocess.run(X, mzXML=mzXML) cat("Size before: ", dim(X), " and after :", dim(Y), "\n") stopifnot( dim(Y)==c(9377, 40) ) # make sure it is what's expected YmzXML = attr(Y, "mzXML") strsplit(YmzXML$dataProcessing, '\n') # show mzXML$dataProcessing record