pkg-trackObjs {trackObjs} | R Documentation |
The trackObjs package sets up a link between R objects in memory and files on disk so that objects are automatically resaved to files when they are changed. R objects in files are read in on demand and do not consume memory prior to being referenced. The trackObjs package also tracks times when objects are created and modified, and caches some basic characteristics of objects to allow for fast summaries of objects.
Each object is stored in a separate RData file using the standard
format as used by save()
, so that objects can be manually
picked out of or added to the trackObjs database if needed.
Tracking works by replacing a tracked variable by an 'activeBinding', which when accessed looks up information in an associated 'tracking environment' and reads or writes the corresponding RData file and/or gets or assigns the variable in the tracking environment.
There are three main reasons to use the trackObjs
package:
save()
and load()
There is an option to control whether tracked objects are cached in
memory as well as being stored on disk. By default, objects are
not cached. To save time when working with collections of objects that
will all fit in memory, turn on
caching with
track.options(cache=TRUE)
, or start
tracking with track.start(..., cache=TRUE)
.
Here is a brief example of tracking some variables in the global environment:
> library(trackObjs) > track.start("tmp1") > x <- 123 # Not yet tracked > track(x) # Variable 'x' is now tracked > track(y <- matrix(1:6, ncol=2)) # 'y' is assigned & tracked > z1 <- list("a", "b", "c") > z2 <- Sys.time() > track(list=c("z1", "z2")) # Track a bunch of variables > track.summary(size=F) # See a summary of tracked vars class mode extent length modified TA TW x numeric numeric [1] 1 2007-09-07 08:50:58 0 1 y matrix numeric [3x2] 6 2007-09-07 08:50:58 0 1 z1 list list [[3]] 3 2007-09-07 08:50:58 0 1 z2 POSIXt,POSIXct numeric [1] 1 2007-09-07 08:50:58 0 1 > # (TA="total accesses", TW="total writes") > ls(all=TRUE) [1] "x" "y" "z1" "z2" > track.stop() # Stop tracking > ls(all=TRUE) character(0) > > # Restart using the tracking dir -- the variables reappear > track.start("tmp1") # Start using the tracking dir again > ls(all=TRUE) [1] "x" "y" "z1" "z2" > track.summary(size=F) class mode extent length modified TA TW x numeric numeric [1] 1 2007-09-07 08:50:58 0 1 y matrix numeric [3x2] 6 2007-09-07 08:50:58 0 1 z1 list list [[3]] 3 2007-09-07 08:50:58 0 1 z2 POSIXt,POSIXct numeric [1] 1 2007-09-07 08:50:58 0 1 > track.stop() > > # the files in the tracking directory: > list.files("tmp1", all=TRUE) [1] "." ".." [3] "filemap.txt" ".trackingSummary.rda" [5] "x.rda" "y.rda" [7] "z1.rda" "z2.rda" >
There are several points to note:
track()
ed - newly created objects
are not tracked. (This is not a "feature", but there is currently no way of
automatically tracking newly created objects – this is on the
wishlist.) Thus, it is possible
for variables in a tracked environment to either tracked or untracked.
save()
/load()
(RData files).
Six functions cover the majority of common usage of the trackObjs package:
track.start(dir=...)
: start tracking
the global environment, with files saved in dir
track.stop()
: stop tracking
(any unsaved tracked variables are saved to disk and all tracked variables
become unavailable until tracking starts again)
track(x)
: start tracking x
-
x
in the global environment is replaced by an active binding
and x
is saved in its corresponding file in the tracking
directory and, if caching is on, in the tracking environment
track(x <- value)
: start tracking x
track(list=c('x', 'y'))
: start tracking
specified variables
track(all=TRUE)
: start tracking all
untracked variables in the global environment
untrack(x)
: stop tracking variable x
-
the R object x
is put back as an ordinary object in the global environment
untrack(all=TRUE)
: stop tracking all
variables in the global environment (but tracking is still set up)
untrack(list=...)
: stop tracking specified variables
track.summary()
: print a summary of
the basic characteristics of tracked variables: name, class, extent,
and creation, modification and access times.
track.remove(x)
: completely remove all
traces of x
from the global environment, tracking environment
and tracking directory. Note that if variable x
in the global
environment is tracked,
remove(x)
will make x
an "orphaned" variable:
remove(x)
will just remove the active binding from the global
environment, and leave x
in the tracked environment and on
file, and x
will reappear after restarting tracking.
The trackObjs
package provides many additional functions for
controlling how tracking is performed (e.g., whether or not tracked variables
are cached in memory), examining the state of tracking (show which
variables are tracked, untracked, orphaned, masked, etc.) and repairing
tracking environments and databases that have become inconsistent or incomplete
(this may result from resource limitiations, e.g., being unable to
write a save file due to lack of disk space, or from manual tinkering,
e.g., dropping a new save file into a tracking directory.)
The functions that can be used to set up and take down tracking are:
track.start(dir=...)
: start tracking,
using the supplied directory
track.stop()
: stop tracking
(any unsaved tracked variables are saved to disk and all tracked variables
become unavailable until tracking starts again)
track.dir()
: return the path of the
tracking directory
Functions for tracking and stopping tracking variables:
track(x)
track(var <- value)
track(list=...)
track(all=TRUE)
: start tracking variable(s)
track.load(file=...): load some objects from
a RData file into the tracked environment
untrack(x, keep.in.db=FALSE)
untrack(list=...)
untrack(all=TRUE)
: stop tracking variable(s) -
value is left in place, and optionally, it is also left in the the database
Functions for getting status of tracking and summaries of variables:
track.summary()
: return a data
frame containing a summary of the basic characteristics of tracked
variables: name, class, extent, and creation, modification and access times.
track.status()
: return a data frame
containing information about the tracking status of variables: whether
they are saved to disk or not, etc.
env.is.tracked()
: tell whether an
environment is currently tracked
The remaining functions allow the user to more closely manage variable tracking, but are less likely to be of use to new users.
Functions for getting status of tracking and summaries of variables:
tracked()
: return the names of tracked variables
untracked()
: return the names of
untracked variables
untrackable()
: return the names of
variables that cannot be tracked
track.unsaved()
: return the names of
variables whose copy on file is out-of-date
track.orphaned()
: return the
names of once-tracked variables that have lost their active binding
(should not happen)
track.masked()
: return the names of
once-tracked variables whose active binding has been overwritten by an
ordinary variable (should not happen)
Functions for managing tracking and tracked variables:
track.options()
: examine and set
options to control tracking
track.remove()
: completely remove all
traces of a tracked variable
track.save()
: write unsaved variables to disk
track.flush()
: write unsaved variables to disk, and remove from memory
track.forget()
: delete cached
versions without saving to file (file version will be retrieved next
time the variable is accessed)
track.restart()
: reload variable
values from disk (can forget all cached vars, remove no-longer existing tracked vars)
track.load()
: load variables from a
saved RData file into the tracking session
Functions for recovering from errors:
track.rebuild()
: rebuild tracking
information from objects in memory or on disk
track.flush
: write unsaved variables to disk, and remove from memory
Design and internals of tracking:
Tony Plate <tplate@acm.org>
Roger D. Peng. Interacting with data using the filehash package. R News, 6(4):19-24, October 2006. http://cran.r-project.org/doc/Rnews and http://sandybox.typepad.com/software
David E. Brahm. Delayed data packages. R News, 2(3):11-12, December 2002. http://cran.r-project.org/doc/Rnews
Design of the trackObjs
package.
Potential future features of the trackObjs
package.
Documentation for save
load
(in 'base' package).
Documentation for makeActiveBinding
and related
functions (in 'base' package).
Inspriation from the packages g.data
and
filehash
.