msc.msfiles.read.csv {caMassClass}R Documentation

Read Protein Mass Spectra from CSV files

Description

Read multiple protein mass spectra (SELDI) files, listed in FileList, from a given directory and combine them into a single data structure. Files are in CSV format, possibly compresses. If FileList is an 1D list than data is stored as a matrix one file per column. If FileList is a 2D data-frame than data is stored in 3D array.

Usage

  msc.msfiles.read.csv(directory=".", FileList="\.csv", 
                       SampleNames=NULL, CopyNames=NULL)

Arguments

directory a character vector with name of directory where all the files can be found. Use "/" slashes in directory name. The default corresponds to the working directory getwd().
FileList List of files to read. List can be in the following formats:
  • single string - a regular expression (see regex) to be used in selecting files to read, for example "\.csv"
  • list - list of file names to be read
  • data.frame (multiple lists of file names)- multiple copies of the same samples are present - see details
The last two formats also support file zip and gzip file compression. For example if individual file name is in the format:
  • "dir/a.csv" - uncompressed file 'a.csv' in directory 'dir'
  • "dir/b.zip/a.csv" - file 'a.csv' within zipped file 'b.zip'
  • "dir/a.csv.gz" - gziped individual file
SampleNames Optional list of names to be used as sample/column names.
CopyNames Optional list of names to be used as copy/plane names in case FileList is an 2D data frame.

Details

All files should be in Excel's CSV format (table in text format: 1 row per line, comma delaminated columns). Each file is assumed to have two columns, in case of SELDI data: column 1 (x-axis) is mass/charge (M/Z), and column 2 (y-axis) is spectrum intensity. All files are assumed to have identical first (M/Z) column.

If multiple copies of the same sample were collected than one can store them in a 3D array (data cube) where each column correspond to a single sample, each row is a single mass (M/Z) and each plane is a single copy. To do so one has to pass a 2D data frame as FileList where each column contains file names of multiple copies of the same sample and each row contains filenames of a single copy of different samples.

Value

Data structure containing all the data read from the files. It can be in form of a 2D matrix (nFeatures x nSamples) or 3D array (nFeatures x nSamples x nCopies) depending on input.

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

See Also

Examples

  # example of mode "single string" FileList
  directory  = system.file("Test", package = "caMassClass")
  X = msc.msfiles.read.csv(directory, "IMAC_normal_.*csv")
  dim(X)
  
  # example of explicite 1D FileList
  ProjectFile = file.path(directory,"InputFiles.csv")
  FileList = read.csv(file=ProjectFile, comment.char = "")
  FileList[,3]
  X = msc.msfiles.read.csv(directory, FileList=FileList[,3], SampleNames=FileList[,1])
  dim(X)
  
  # example of explicite 2D FileList
  FileList[,3:4]
  X = msc.msfiles.read.csv(directory, FileList=FileList[,3:4], 
        SampleNames=FileList[,1], CopyNames=c("copy 1", "copy 2"))
  dim(X)

[Package caMassClass version 1.1 Index]