refdata {ref} | R Documentation |
Function refdata
creates objects of class refdata which behave not totally unlike matrices or data.frames but allow for much more memory efficient handling.
# -- usage for R CMD CHECK, see below for human readable version ----------- refdata(x) derefdata(x) derefdata(x) <- value ## S3 method for class 'refdata': x[i = NULL, j = NULL, drop = FALSE, ref = FALSE] ## S3 method for class 'refdata': x[i = NULL, j = NULL, ref = FALSE] <- value ## S3 method for class 'refdata': dim(x) ## S3 method for class 'refdata': dimnames(x) ## S3 method for class 'refdata': row.names(x) ## S3 method for class 'refdata': names(x) # -- most important usage for human beings -------------------------------- # rd <- refdata(x) # create reference # derefdata(rd) # retrieve original data # derefdata(rd) <- value # modify original data # rd[] # get all (current) data # rd[i, j] # get part of data # rd[i, j, ref=TRUE] # get new reference on part of data # rd[i, j] <- value # modify part of data (now rd is reference on local copy of the data) # rd[i, j, ref=TRUE] <- value # modify part of original data (respecting subsetting history) # dim(rd) # dim of (subsetted) data # dimnames(rd) # dimnames of (subsetted) data
x |
a matrix or data.frame or any other 2-dimensional object that has operators "[" and "[<-" defined |
i |
row index |
j |
col index |
ref |
FALSE by default. In subsetting: FALSE returns data, TRUE returns new refdata object. In assignments: FALSE modifies a local copy and returns a refdata object embedding it, TRUE modifies the original. |
drop |
FALSE by default, i.e. returned data have always a dimension attribute. TRUE drops dimension in some cases, the exact result depends on whether a matrix or data.frame is embedded |
value |
some value to be assigned |
Refdata objects store 2D-data in one environment and index information in another environment. Derived refdata objects usually share the data environment but not the index environment.
The index information is stored in a standardized and memory efficient form generated by optimal.index
.
Thus refdata objects can be copied and subsetted and even modified without duplicating the data in memory.
Empty square bracket subsetting (rd[]
) returns the data, square bracket subsetting (rd[i, j]
) returns subsets of the data as expected.
An additional argument (rd[i, j, ref=TRUE]
) allows to get a reference that stores the subsetting indices. Such a reference behaves transparently as if a smaller matrix/data.frame would be stored and can be subsetted again recursively.
With ref=TRUE indices are always interpreted as row/col indices, i.e. x[i]
and x[cbind(i, j)]
are undefined (and raise stop errors)
Standard square bracket assignment (rd[i, j] <- value
) creates a reference to a locally modified copy of the (potentially subsetted) data.
An additional argument (rd[i, j, ref=TRUE] <- value
) allows to modify the original data, properly recognizing the subsetting history.
A method dim(refdata)
returns the dim of the (indexed) data.
A dimnames(refdata)
returns the dimnames of the (indexed) data.
an object of class refdata (appended to class attributes of data), which is an empty list with two attributes
dat |
the environment where the data x and its dimension dim is stored |
ind |
the environment where the indexes i, j and the effective subset size ni, nj is stored |
The refdata code is currently R only (not implemented for S+).
Please note the following differences to matrices and dataframes:
x[]
x[]
instead of x
in order to get all current datadrop=FALSE
x[i]
x[][i]
instead, but beware of differences between matrices and dataframesx[cbind()]
x[][cbind(i, j)]
insteadref=TRUE
ref
needs to be used sensibly to exploit the advantages of refdata objectsJens Oehlschlägel
Extract
, matrix
, data.frame
, optimal.index
, ref
## Simple usage Example x <- cbind(1:5, 5:1) # take a matrix or data frame rx <- refdata(x) # wrap it into an refdata object rx # see the autoprinting rm(x) # delete original to save memory rx[] # extract all data rx[-1, ] # extract part of data rx2 <- rx[-1, , ref=TRUE] # create refdata object referencing part of data (only index, no data is duplicated) rx2 # compare autoprinting rx2[] # extract 'all' data rx2[-1, ] # extract part of (part of) data cat("for more examples look the help pages\n") ## Not run: # Memory saving demos square.matrix.size <- 1000 recursion.depth.limit <- 10 non.referenced.matrix <- matrix(1:(square.matrix.size*square.matrix.size), nrow=square.matrix.size, ncol=square.matrix.size) rownames(non.referenced.matrix) <- paste("a", seq(length=square.matrix.size), sep="") colnames(non.referenced.matrix) <- paste("b", seq(length=square.matrix.size), sep="") referenced.matrix <- refdata(non.referenced.matrix) recurse.nonref <- function(m, depth.limit=10){ x <- m[1,1] # need read access here to create local copy gc() cat("depth.limit=", depth.limit, " memory.size=", memsize.wrapper(), "\n", sep="") if (depth.limit) Recall(m[-1, -1, drop=FALSE], depth.limit=depth.limit-1) invisible() } recurse.ref <- function(m, depth.limit=10){ x <- m[1,1] # read access, otherwise nothing happens gc() cat("depth.limit=", depth.limit, " memory.size=", memsize.wrapper(), "\n", sep="") if (depth.limit) Recall(m[-1, -1, ref=TRUE], depth.limit=depth.limit-1) invisible() } gc() memsize.wrapper() recurse.ref(referenced.matrix, recursion.depth.limit) gc() memsize.wrapper() recurse.nonref(non.referenced.matrix, recursion.depth.limit) gc() memsize.wrapper() rm(recurse.nonref, recurse.ref, non.referenced.matrix, referenced.matrix, square.matrix.size, recursion.depth.limit) ## End(Not run) cat("for even more examples look at regression.test.refdata()\n") regression.test.refdata() # testing correctness of refdata functionality