cbc.read.table {colbycol} | R Documentation |
cbc.read.table
is able to read a huge text data file well beyond the memory restrictions imposed by read.table
.
It reads the file line by line, breaks it into as many physical files as columns, reads them back into R one by one and saves them in efficient, native R data files.
cbc.read.table(file, tmp.dir = tempfile( pattern = "dir" ), sep = "\t", header = TRUE, ...)
file |
file to be loaded |
tmp.dir |
path to the (empty) directory where temporary files are stored |
sep |
field separator for the data in file |
header |
whether file contains headers or not |
... |
other parameters passed to read.table internally |
This function invokes a python script which reads file
line by line, breaks each of them into tokens as indicated by sep
, and stores each column in an independent text file in tmp.dir
.
These files are then read into R one by one and the text files are replaced by R native data files stored with save
.
The function returns a tiny object containing the required metadata, while the data sits in tmp.dir
.
It is convenient to provide the full path to tmp.dir
in case the working directory is modified; otherwise, the temporary files could not be found.
If no temporary directory is provided, a temporal one is created that will be erased as the R session ends.
Caution is required to pass extra arguments to the internal calls to read.table
via ...
.
An object of class colbycol
containing the metadata required to access the data from the original file that is stored in tmp.dir
.
Carlos J. Gil Bellosta
None
cbc.data <- cbc.read.table( system.file("data", "cbc.test.data.txt", package = "colbycol"), sep = "\t" ) nrow( cbc.data ) colnames( cbc.data ) col.01 <- cbc.get.col( cbc.data, 1) col.02 <- cbc.get.col( cbc.data, "col02" )