generic methods {birch}R Documentation

Generic methods for birch objects

Description

A description of the generic methods that are compatible with birch objects.

Usage

## S3 method for class 'birch':
print(x, ...)
## S3 method for class 'birch':
summary(object, ...)
## S3 method for class 'birch':
x[i, j]
## S3 method for class 'birch':
dim(x)
## S3 method for class 'birch':
length(x)
## S3 method for class 'birch':
cbind(scalar, x)
## S3 method for class 'birch':
plot(x, centers=FALSE, ...)
## S3 method for class 'birch':
pairs(x, centers=FALSE, ...)
## S3 method for class 'birch':
points(x, ...)
## S3 method for class 'birch':
lines(x, col, ...)

Arguments

x, object A birch object
i, j Indices specifying which subclusters and dimensions to extract.
scalar A scalar to add as a column to the birch object.
centers A Boolean. If TRUE, then just plots the centers of the subclusters. See the details for more information.
col Color argument for the lines method.
... Pass additional information to the respective methods.

Details

Most of these methods do not require further explanation, as their behaviour is fairly standard. The plot and pairs methods, however, do something a little different.

While they accept all the usual arguments as plot (the arguments are simply passed on to e.g. plot.xy), instead of simply producing points/lines of the values, which is not possible with a birch object, these methods produce ellipses for each subcluster, based on its covariance structure, and making the assumption that the underlying data is multivariate normal. The ellipses are then formed (using the ellipse package) at the 95% confidence level.

However, if there are a lot of subclusters, then this plot gets “messy”, and so it is possible to plot just the centers of each subcluster by setting the centers argument to true. The points and lines methods add the centers and ellipses respectively to the existing plot.

Value

The dim method returns a vector of length three containing the total number of observations in the tree, the number of columns of the data set, and the number of subclusters. The length method just returns the number of subclusters. Note that both of these commands operate on the object given as an argument, and do not check if the tree has been updated (e.g. by adding data with the birch.addToTree command).
The summary method returns the mean and covariance of the whole birch object - equivalently (and, as it turns out, identically) the whole underlying data set.
The ‘[i,j]’ method selects the i-th subcluster and the j-th column of the birch object, and has functionality similar to that of the usual indexing of matrices. Similarly, the cbind method effectively inserts a column containing a scalar (recycled to the appropriate length) in front of the data set. See note for a caveat with these methods.
The print, plot, pairs, points and lines methods return nothing.

Note

Care should be taken when using the indexing, as

birch(z, 1)[,1:2]
will not produce the same result as
birch(z[,1:2], 1)
Similarly, there are no guarantees for cbind.

Author(s)

Justin Harrington harringt@stat.ubc.ca and Matias Salibian-Barrera matias@stat.ubc.ca

References

Harrington, J and Salibian-Barrera, M (2007) “Finding Approximate Solutions to Combinatorial Problems with Very Large Datasets using BIRCH”, submitted to Statistical Algorithms and Software, 2nd Special Issue Computational Statistics and Data Analysis. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch.pdf.

Harrington, J and Salibian-Barrera, M (2008) “birch: Working with very large data sets”, submitted to Journal of Statistical Software. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch-jss.pdf.

See Also

birch

Examples

## Load demo birch object
data(birchObj)
dim(birchObj)
newObj <- cbind(1, birchObj)
dim(newObj)

dim(birchObj[-1, -2])

[Package birch version 1.1-3 Index]