generic methods {birch} | R Documentation |
A description of the generic methods that are compatible with birch objects.
## S3 method for class 'birch': print(x, ...) ## S3 method for class 'birch': summary(object, ...) ## S3 method for class 'birch': x[i, j] ## S3 method for class 'birch': dim(x) ## S3 method for class 'birch': length(x) ## S3 method for class 'birch': cbind(scalar, x) ## S3 method for class 'birch': plot(x, centers=FALSE, ...) ## S3 method for class 'birch': pairs(x, centers=FALSE, ...) ## S3 method for class 'birch': points(x, ...) ## S3 method for class 'birch': lines(x, col, ...)
x, object |
A birch object |
i, j |
Indices specifying which subclusters and dimensions to extract. |
scalar |
A scalar to add as a column to the birch object. |
centers |
A Boolean. If TRUE, then just plots the centers of the subclusters. See the details for more information. |
col |
Color argument for the lines method. |
... |
Pass additional information to the respective methods. |
Most of these methods do not require further explanation, as their behaviour is fairly standard. The plot and pairs methods, however, do something a little different.
While they accept all the usual arguments as plot (the arguments are simply
passed on to e.g. plot.xy
), instead of simply producing points/lines of the
values, which is not possible with a birch object, these methods produce
ellipses for each subcluster, based on its covariance structure, and
making the assumption that the underlying data is multivariate normal. The
ellipses are then formed (using the ellipse package) at the 95%
confidence level.
However, if there are a lot of subclusters, then this plot gets
“messy”, and so it is possible to plot just the centers of each
subcluster by setting the centers
argument to true. The
points
and lines
methods add the centers and ellipses
respectively to the existing plot.
The dim
method returns a vector of length three
containing the total number of observations in the tree, the number of
columns of the data set, and the number of subclusters. The
length
method just returns the number of subclusters. Note that
both of these commands operate on the object given as an argument, and do not check
if the tree has been updated (e.g. by adding data with the
birch.addToTree
command).
The summary
method returns the mean and covariance of the whole
birch object - equivalently (and, as it turns out, identically) the whole underlying data set.
The ‘[i,j]’ method selects the i-th subcluster and the j-th
column of the birch object, and has functionality similar to that of the
usual indexing of matrices. Similarly, the cbind
method
effectively inserts a column containing a scalar (recycled to the
appropriate length) in front of the data set. See note for a caveat with
these methods.
The print
, plot
, pairs
, points
and
lines
methods return nothing.
Care should be taken when using the indexing, as
birch(z, 1)[,1:2]will not produce the same result as
birch(z[,1:2], 1)Similarly, there are no guarantees for
cbind
.
Justin Harrington harringt@stat.ubc.ca and Matias Salibian-Barrera matias@stat.ubc.ca
Harrington, J and Salibian-Barrera, M (2007) “Finding Approximate Solutions to Combinatorial Problems with Very Large Datasets using BIRCH”, submitted to Statistical Algorithms and Software, 2nd Special Issue Computational Statistics and Data Analysis. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch.pdf.
Harrington, J and Salibian-Barrera, M (2008) “birch: Working with very large data sets”, submitted to Journal of Statistical Software. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch-jss.pdf.
## Load demo birch object data(birchObj) dim(birchObj) newObj <- cbind(1, birchObj) dim(newObj) dim(birchObj[-1, -2])