birch-package {birch}R Documentation

Working with very large data sets using BIRCH.

Description

The functions in this package are designed for working with very large data sets by pre-processing the data set with an algorithm called BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), which transforms the data set into compact, locally similar subclusters, each with summary statistics attached (called clustering features). Then, instead of using the full data set, these summary statistics can be used.

This approach is most advangeous in two situations: when the data cannot be loaded into memory due to its size; and/or when some form of combinatorial optimization is required and the size of the solution space makes finding global maximums/minimums difficult.

A complete explanation of this package is given in Harrington and Salibian-Barrera (2008), and discussion of the underlying algorithms can be found in Harrington and Salibian-Barrera (2007).

Documentation for developers can be found in the doc directory of this package. Also, the source code contains dOxygen tags for further information.

Details

The main function is
birch takes a data set (an R object, text file, etc), and creates a birch object

Various generic methods are present, including
print
summary
plot

Finally, some combinatorial-style problems have been implemented. These include:
covMcd.birch Minimum Covariance Determinant (robust estimator for location and dispersion)
lts.birch Least Trimmed Squares (robust regression estimator)
rlga.birch Robust Linear Grouping Analysis (robust clustering about hyperplanes)
kmeans.birch k-means

Author(s)

Justin Harrington harringt@stat.ubc.ca and Matias Salibian-Barrera matias@stat.ubc.ca

References

Harrington, J and Salibian-Barrera, M (2007) “Finding Approximate Solutions to Combinatorial Problems with Very Large Datasets using BIRCH”, submitted to Statistical Algorithms and Software, 2nd Special Issue Computational Statistics and Data Analysis. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch.pdf.

Harrington, J and Salibian-Barrera, M (2008) “birch: Working with very large data sets”, submitted to Journal of Statistical Software. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch-jss.pdf.

See Also

birch, covMcd.birch, lts.birch, rlga.birch, plot.birch.


[Package birch version 1.1-3 Index]