msc.biomarkers.fill {caMassClass}R Documentation

Fill Empty Spaces in Biomarker Matrix

Description

Fill empty spaces (NA's) in biomarker matrix created by msc.peaks.align

Usage

msc.biomarkers.fill( X, Bmrks, BinBounds, FillType=0.9)

Arguments

X Spectrum data either in matrix format [nFeatures x nSamples] or in 3D array format [nFeatures x nSamples x nCopies]. Row names (rownames(X)) store M/Z mass of each row.
Bmrks biomarker matrix containing one sample per column and one biomarker per row
BinBounds position (mass) of left-most and right-most peak in each bin
FillType how to fill empty spaces in biomarker data?
  • if 0<=FillType<=1 than fill spaces with quantile(probs=FillType). For example: if FillType=1/2 than medium will be used, if FillType=1 than maximum value will be used, if FillType=0.9 than maximum will be used after discarding 10% of "outliers"
  • if FillType<0 than empty spaces will not be filled and NA's will remain
  • if FillType==2 than X value closest to the center of the bin will be used
  • if FillType==3 empty spaces will be set to zero

Details

This function attempts to correct a problem which is a side-effect of msc.peaks.align function. Namely numerous NA's in biomarker data, each time when some peak was found only in some of the samples. msc.peaks.align already removed the most problematic features using SampFrac variable, but likely a lot of NA's remain and they can cause problem for some classification algorithms.

Value

Data in the same format and size as Bmrks

Note

The whole idea of filling spaces in biomarker matrix is a little bit suspect since we are mixing proverbial apples and oranges. However, it might be better than the other options of filling empty spaces with zeros or keeping NA's.

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

See Also

Examples

  # load 'X' and 'Y' calculated in example("msc.peaks.align")  
  example("msc.peaks.align")
  nNA = sum(is.na(Y$Bmrk))
  cat( "dim(Y$Bmrk)=", dim(Y$Bmrk), "; number of NA's is ", nNA,"\n")
  stopifnot(nNA==232)
  
  # run msc.biomarkers.fill
  Z = msc.biomarkers.fill( X, Y$Bmrks, Y$BinBounds)
  nNA = sum(is.na(Z))
  cat( "dim(Z)=", dim(Z), "; number of NA's is ", nNA,"\n")
  stopifnot( dim(Z)==c(22, 20, 2) )
  stopifnot(nNA==0)

  # run msc.biomarkers.fill with other FillType
  Z = msc.biomarkers.fill( X, Y$Bmrks, Y$BinBounds, FillType=2)
 

[Package caMassClass version 1.6 Index]