run.analysis {FTICRMS}R Documentation

Test for Significant Peaks in FT-ICR MS by Controlling FDR

Description

Takes the file generated by run.cluster.matrix and tests the peaks using Benjamini-Hochberg to control the False Discovery Rate.

Usage

run.analysis(form, covariates, FDR = 0.1, norm.post.repl = FALSE, 
             norm.peaks = c("common", "all", "none"), normalization, 
             add.norm = TRUE,  repl.method = "max", use.model = "lm",
             pval.fcn = "default", lrg.only = TRUE, masses = NA,
             isotope.dist = 7, root.dir = ".", lrg.dir,
             lrg.file = lrg_peaks.RData, res.dir,
             res.file = "analyzed.RData", overwrite = FALSE,
             use.par.file = FALSE, par.file = "parameters.RData",
             bhbysubj = TRUE, subs, ...)

Arguments

form object of class “formula” to be used by use.model for testing using covariates
covariates data frame containing covariates used in analysis
FDR False Discovery Rate in Benjamini-Hochberg test
norm.post.repl logical; whether to normalize after combining replicates
norm.peaks which peaks to use in normalization
normalization type of normalization to use on spectra before statistical analysis; kept for compatibility (see below)
add.norm logical; whether to normalize additively or multiplicatively on the log scale
repl.method function or string representing the name of a function; how to deal with replicates
use.model function or string representing the name of a function; what test to apply to data
pval.fcn function to extract p-values; default is overall p-value of test
lrg.only logical; whether to consider only peaks that have at least one “large” peak; i.e., identified by run.lrg.peaks
masses specific masses to test
isotope.dist maximum distance for declaring isotopes
root.dir directory for parameters file and raw data
lrg.dir directory for large peaks file; default is paste(root.dir, "/Large_Peaks", sep = "")
lrg.file name of file to store large peaks in
res.dir directory for results file; default is paste(root.dir, "/Results", sep = "")
res.file name for results file
overwrite logical; whether to replace existing files with new ones
use.par.file logical; if TRUE, then parameters are read from par.file in directory root.dir
par.file string containing name of parameters file
bhbysubj logical; whether to look for number of large peaks by subject (i.e., combining replicates) or by spectrum
subs subset of spectra to use for analysis; see below
... additional parameters to be passed to use.model

Details

Reads in information from file created by run.cluster.matrix and creates a file named res.file in directory res.dir which contains the following variables:
amps matrix of transformed amplitudes of alignment peaks
bysubjvar a vector which tells which rows of covariates are identified as the same subject
centers matrix of calculated masses of alignment peaks
clust.mat matrix of transformed amplitudes of peaks used in statistical testing
min.FDR FDR level required to get at least one significant test given the starting set of peaks
sigs matrix containing all tests which are significant under at least one scenario
which.sig matrix containing all peaks tested
parameter.list if use.par.file = TRUE, a list generated by extract.pars; otherwise not defined

Value

No value returned; the file is simply created.

Note

If use.par.file == TRUE and other parameters are entered into the function call, then the parameters entered in the function call overwrite those read in from the file. Note that this is opposite from the behavior for FTICRMS versions 0.7 and earlier.

norm.peaks determines the peaks used for normalization: "common" normalizes each spectrum using the average peak height of the alignment peaks from that spectrum in amps; "all" normalizes each spectrum using the average peak height of all peaks in that spectrum.

normalization is obsolete but is included for compatibility with previous versions of the package. The valid normalization schemes translate to the new scheme as follows: "common" is norm.post.repl = FALSE and norm.peaks = "common"; "postbase" is norm.post.repl = FALSE and norm.peaks = "all"; "postrepl" is norm.post.repl = TRUE and norm.peaks = "all"; and "none" is norm.peaks = "none" (and norm.post.repl = FALSE, although this value is irrelevant).

Replicates for the same subject are assumed to be determined by the unique values of covariates$subj. (Future implementations will allow for other methods of defining this.) To analyze replicates as independent samples, use repl.method = "none". This will also speed up the run time if there are no replicates in the data set.

The argument subs can be logical or numeric or character; if it is defined, then covariates is modified to covariates[subs,,drop=F].

If masses is not NULL, then the listed masses plus anything that could be in the first isotope.dist - 1 isotope peaks of each mass are tested.

If something other than the p-value for the overall test statistic is needed, then the user-defined function for pval.fcn should have the form pval.fcn = function(x){...}, where x is a model object of the type returned by use.model; and should have a return value of the desired p-value.

If use.model evaluates to t.test, then the difference between the two groups for each peak is recorded in which.sig$Delta and sigs$Delta; otherwise, these columns consist entirely of NA entries.

Each rowname of sigs and which.sig represents the range of masses that were used to form that peak. The columns of those objects give the p-value of the peaks in each row, the number of samples that had large peaks for each row, and the significance of each test, coded as
NA peak not eligible for B-H
0 peak eligible for B-H but not declared significant
1 peak declared significant
The “S” labels refer to the number of large peaks that were necessary for a row to be eligible. For example, the column labeled S5 in sigs used as its starting set of p-values all rows which had which.sig$num.lrg >= 5. If bhbysubj == TRUE, then the entries of num.lrg are obtained by going subject-by-subject and for each mass counting the number of subjects who had at least one spectrum with a large peak at that mass; otherwise, num.lrg for each mass is simply the total number of spectra that had a large peak at that mass.

Author(s)

Don Barkauskas (barkda@wald.ucdavis.edu)

References

Barkauskas, D.A. and D.M. Rocke. (2009a) “A general-purpose baseline estimation algorithm for spectroscopic data”. to appear in Analytica Chimica Acta. doi:10.1016/j.aca.2009.10.043

Barkauskas, D.A. et al. (2009b) “Analysis of MALDI FT-ICR mass spectrometry data: A time series approach”. Analytica Chimica Acta, 648:2, 207–214.

Barkauskas, D.A. et al. (2009c) “Detecting glycan cancer biomarkers in serum samples using MALDI FT-ICR mass spectrometry data”. Bioinformatics, 25:2, 251–257.

Benjamini, Y. and Hochberg, Y. (1995) “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” J. Roy. Statist. Soc. Ser. B, 57:1, 289–300.

See Also

run.strong.peaks


[Package FTICRMS version 0.8 Index]