msc.msfiles.read.csv {caMassClass} | R Documentation |
Read multiple protein mass spectra (SELDI) files, listed in FileList
,
from a given directory and combine them into a single data structure. Files
are in CSV format, possibly compresses. If FileList
is
an 1D list than data is stored as a matrix one file per column. If
FileList
is a 2D data-frame than data is stored in 3D array.
msc.msfiles.read.csv(directory=".", FileList="\.csv", SampleNames=NULL, CopyNames=NULL)
directory |
a character vector with name of directory where all the
files can be found. Use "/" slashes in directory name. The default
corresponds to the working directory getwd (). |
FileList |
List of files to read. List can be in the following formats:
|
SampleNames |
Optional list of names to be used as sample/column names. |
CopyNames |
Optional list of names to be used as copy/plane names in case FileList is an 2D data frame. |
All files should be in Excel's CSV format (table in text format: 1 row per line, comma delaminated columns). Each file is assumed to have two columns, in case of SELDI data: column 1 (x-axis) is mass/charge (M/Z), and column 2 (y-axis) is spectrum intensity. All files are assumed to have identical first (M/Z) column.
If multiple copies of the same sample were collected than one can store them
in a 3D array (data cube) where each column correspond to a single sample,
each row is a single mass (M/Z) and each plane is a single copy. To do so one
has to pass a 2D data frame as FileList
where each column contains file
names of multiple copies of the same sample and each row contains filenames of
a single copy of different samples.
Data structure containing all the data read from the files. It can be in form of a 2D matrix (nFeatures x nSamples) or 3D array (nFeatures x nSamples x nCopies) depending on input.
Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com
msc.project.run
pipeline.
msc.project.read
gives user much more flexibility in
defining the meaning of the data to be read.
msc.preprocess.run
is often used as a next step in
the process
read.files
from PROcess library can
read a single SELDI file and rmBaseline
can read in a
directory of files and substract their baselines.
ppc.read.raw.batch
and
ppc.read.raw.nobatch
from ppc library can
also read SELDI files, assuming correct directory structure.
# example of mode "single string" FileList directory = system.file("Test", package = "caMassClass") X = msc.msfiles.read.csv(directory, "IMAC_normal_.*csv") dim(X) # example of explicite 1D FileList ProjectFile = file.path(directory,"InputFiles.csv") FileList = read.csv(file=ProjectFile, comment.char = "") FileList[,3] X = msc.msfiles.read.csv(directory, FileList=FileList[,3], SampleNames=FileList[,1]) dim(X) # example of explicite 2D FileList FileList[,3:4] X = msc.msfiles.read.csv(directory, FileList=FileList[,3:4], SampleNames=FileList[,1], CopyNames=c("copy 1", "copy 2")) dim(X)