msc.mass.adjust {caMassClass}R Documentation

Perform Normalization and Mass Drift Adjustment for Mass Spectra Data.

Description

Perform normalization and mass drift adjustment for protein mass spectra (for example SELDI) data. Process also refered to as removal of "phase variation" in MS data by peak alignment, "profile alignment", "mass calibration"

Usage

  msc.mass.adjust(X, scalePar=2, shiftPar=0.0005, AvrSamp=0)
  msc.mass.adjust.calc(X, scalePar=2, shiftPar=0.0005, AvrSamp=0)
  msc.mass.adjust.apply(X, shiftX, scaleY, shiftY) 

Arguments

X Spectrum data either in matrix format [nFeatures x nSamples] or in 3D array format [nFeatures x nSamples x nCopies]. Row names (rownames(X)) store M/Z mass of each row.
scalePar Controls scaling (normalization): 1 means that afterwards all samples will have the same mean, 2 means that afterwards all samples will have the same mean and medium (default)
shiftPar Controls mass adjustment. Shifting sample has to improve correlation by at least that amount to be considered. Designed to prevent shifts based on "improvement" on order of magnitude of machine accuracy. If set to too large will turn off shifting. Default = 0.0005.
AvrSamp Is used to normalize test set the same way train set was normalized. Test set is processed using AvrSamp array that was one of the outputs from train-set mass-adjustment. See examples.
shiftX matrix [nSamp x nCopy] - integer number of positions a sample should be shifted to the right (+) or left (-). Output from msc.mass.adjust.calc and input to msc.mass.adjust.apply.
scaleY matrix [nSamp x nCopy] - multiply each sample in order to normalize it. Output from msc.mass.adjust.calc and input to msc.mass.adjust.apply.
shiftY matrix [nSamp x nCopy] - subtract this number from scaled sample (if matching medians). Output from msc.mass.adjust.calc and input to msc.mass.adjust.apply.

Details

Mass adjustment assumes that SELDI data has some error associated with inaccuracy of setting the starting point of time measurement (x-axis origin or zero M/Z value). We try to correct this error by allowing the samples to shift a few time-steps to the left or to the right, if that will help with cross-correlation with other samples. The function performs the following steps

msc.mass.adjust function was split into two parts (one to calculate parameters and one to apply them) in order to give users more flexibility and information about what is done to the data. This split allows inspection, plotting and/or modification of shiftX, shiftY, scaleY parameters before data is modified. For example one can set shiftX to zero to perform normalization without mass adjustment or set shiftY to zero and scaleY to one to perform mass adjustment without normalization. Three function provided are:

Value

Functions msc.mass.adjust and msc.mass.adjust.apply return modified spectra in the same format and size as X. Functions msc.mass.adjust.calc returns list containing the following:

shiftX matrix [nSamp x nCopy] - integer number of positions sample should be shifted to the right (+) or left (-)
scaleY matrix [nSamp x nCopy] - multiply each sample in order to normalize it
shiftY matrix [nSamp x nCopy] - subtract this number from scaled sample (if matching mediums)
AvrSamp Use AvrSamp returned from train-set mass-adjustment to process test-set

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

References

Description of more elaborate algorithm for similar purpose can be found in Lin S., Haney R., Campa M., Fitzgerald M., Patz E.; "Characterizing phase variations in MALDI-TOF data and correcting them by peak alignment"; Cancer Informatics 2005: 1(1) 32-40

See Also

Examples

  # load "Data_IMAC.Rdata" file containing raw MS spectra 'X'  
  if (!file.exists("Data_IMAC.Rdata")) example("msc.project.read")
  load("Data_IMAC.Rdata")
  
  # run on 3D input data using long syntax
  out = msc.mass.adjust.calc (X)
  Y   = msc.mass.adjust.apply(X, out$ShiftX, out$ScaleY, out$ShiftY)
  stopifnot(  mean(out$ShiftX)==-0.15, abs(mean(out$ScaleY)-0.98)<0.01 )
  
  # check what happened to means
  Z   = cbind(colMeans(X), colMeans(Y))
  colnames(Z) = c("copy 1 before", "copy 2 before", "copy 1 after", "copy 2 after" )
  cat("Sample means after and after:\n")
  Z
  
  # check what happen to sample correlation
  A = msc.sample.correlation(X, PeaksOnly=TRUE)
  B = msc.sample.correlation(Y, PeaksOnly=TRUE)
  cat("Mean corelation between two copies of the same sample:\n")
  cat(" before: ", mean(A$innerCor)," after: ", mean(B$innerCor), "\n")
  cat("Mean corelation between unrelated samples:\n")
  cat(" before: ", mean(A$outerCor)," after: ", mean(B$outerCor), "\n")
  
  # run on 2D input data using short syntax 
  # check what happened to means and medians
  Y = msc.mass.adjust(X[,,1], scalePar=2)
  Z = cbind(colMeans(X[,,1]), apply(X[,,1],2,median), colMeans(Y), apply(Y,2,median))
  colnames(Z) = c("means before", "medians before", "means after", "medians after" )
  Z
  Y = msc.mass.adjust(X[,,1], scalePar=1)
  Z = cbind(colMeans(X[,,1]), apply(X[,,1],2,median), colMeans(Y), apply(Y,2,median))
  colnames(Z) = c("means before", "medians before", "means after", "medians after" )
  Z
  
  # mass adjustment for train and test sets, where test set is normalized in 
  # the same way as train set was
  Xtrain = X[, 1:10,]
  Xtest  = X[,11:20,]
  out    = msc.mass.adjust.calc (Xtrain);
  Xtrain = msc.mass.adjust.apply(Xtrain, out$ShiftX, out$ScaleY, out$ShiftY)
  out    = msc.mass.adjust.calc (Xtest , AvrSamp=out$AvrSamp);
  Xtest  = msc.mass.adjust.apply(Xtest , out$ShiftX, out$ScaleY, out$ShiftY)  

[Package caMassClass version 1.6 Index]