cbc.read.table {colbycol}R Documentation

Reads a huge text file and creates a colbycol object

Description

cbc.read.table is able to read a huge text data file well beyond the memory restrictions imposed by read.table. It reads the file line by line, breaks it into as many physical files as columns, reads them back into R one by one and saves them in efficient, native R data files.

Usage

cbc.read.table(file, tmp.dir = tempfile( pattern = "dir" ), sep = "\t", header = TRUE, ...)

Arguments

file file to be loaded
tmp.dir path to the (empty) directory where temporary files are stored
sep field separator for the data in file
header whether file contains headers or not
... other parameters passed to read.table internally

Details

This function invokes a python script which reads file line by line, breaks each of them into tokens as indicated by sep, and stores each column in an independent text file in tmp.dir. These files are then read into R one by one and the text files are replaced by R native data files stored with save. The function returns a tiny object containing the required metadata, while the data sits in tmp.dir.

It is convenient to provide the full path to tmp.dir in case the working directory is modified; otherwise, the temporary files could not be found. If no temporary directory is provided, a temporal one is created that will be erased as the R session ends.

Caution is required to pass extra arguments to the internal calls to read.table via ....

Value

An object of class colbycol containing the metadata required to access the data from the original file that is stored in tmp.dir.

Author(s)

Carlos J. Gil Bellosta

References

None

See Also

read.table, save

Examples

    cbc.data <- cbc.read.table( system.file("data", "cbc.test.data.txt", package = "colbycol"), sep = "\t" )
    nrow( cbc.data )
    colnames( cbc.data )
    col.01 <- cbc.get.col( cbc.data, 1)
    col.02 <- cbc.get.col( cbc.data, "col02" )

[Package colbycol version 0.4 Index]