msc.biomarkers.fill {caMassClass} | R Documentation |
Fill empty spaces (NA's) in biomarker matrix created by
msc.peaks.align
msc.biomarkers.fill( X, Bmrks, BinBounds, FillType=0.9)
X |
Spectrum data either in matrix format [nFeatures x nSamples] or in
3D array format [nFeatures x nSamples x nCopies]. Row names
(rownames(X)) store M/Z mass of each row. |
Bmrks |
biomarker matrix containing one sample per column and one biomarker per row |
BinBounds |
position (mass) of left-most and right-most peak in each bin |
FillType |
how to fill empty spaces in biomarker data?
|
This function attempts to correct a problem which is a side-effect of
msc.peaks.align
function. Namely numerous NA's in biomarker data,
each time when some peak was found only in some of the samples.
msc.peaks.align
already removed the most problematic features using
SampFrac
variable, but likely a lot of NA's remain and they can cause
problem for some classification algorithms.
Data in the same format and size as Bmrks
The whole idea of filling spaces in biomarker matrix is a little bit suspect since we are mixing proverbial apples and oranges. However, it might be better than the other options of filling empty spaces with zeros or keeping NA's.
Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com
msc.preprocess.run
and
msc.project.run
pipelines.
msc.peaks.align
,
or from Ciphergen's software
msc.copies.merge
msc.biomarkers.read.csv
and msc.biomarkers.write.csv
.
pk2bmkr
from PROcess package
also perform similar function.
# load 'X' and 'Y' calculated in example("msc.peaks.align") example("msc.peaks.align") nNA = sum(is.na(Y$Bmrk)) cat( "dim(Y$Bmrk)=", dim(Y$Bmrk), "; number of NA's is ", nNA,"\n") stopifnot(nNA==232) # run msc.biomarkers.fill Z = msc.biomarkers.fill( X, Y$Bmrks, Y$BinBounds) nNA = sum(is.na(Z)) cat( "dim(Z)=", dim(Z), "; number of NA's is ", nNA,"\n") stopifnot( dim(Z)==c(22, 20, 2) ) stopifnot(nNA==0) # run msc.biomarkers.fill with other FillType Z = msc.biomarkers.fill( X, Y$Bmrks, Y$BinBounds, FillType=2)