rlga.birch {birch}R Documentation

Finding the Robust/Non-Robust LGA solution using BIRCH

Description

Performs Linear Grouping Analysis (LGA) and Robust Linear Grouping Analysis (RLGA) using a BIRCH object.

Usage

lga.birch(birchObject, k, nsamp=100)
rlga.birch(birchObject, k, alpha=0.5, nsamp=100)

Arguments

birchObject an object created by the function birch.
k the number of clusters
alpha numeric parameter controlling the size of the subsets over which the orthogonal residuals are minimized, i.e., alpha*n observations are used when calculating orthogonal residuals in the hyperplane calculations. Allowed values are between 0.5 and 1 and the default is 0.5.
nsamp number of subsets used for initial estimates

Details

Robust Linear Grouping (Garcia-Escudero et al, 2007) is the robust implementation of LGA (Van Aelst et al, 2006), and is concerned with clustering around hyperplanes. The non-birch versions can be found in the package lga, which is also available on CRAN.

This algorithm is the equivalent design for use with a BIRCH object. For further details, please see Harrington and Salibian-Barrera (2007).

Value

Returns a list containing:

clust A list containing a vector of which subclusters make up the clustering (sub) and a vector with the underlying observations that make up the clusters (obs). For the robust algorithm, a value of zero indicates it does not belong to the best h-subset.
ROSS the residual sum of squares of orthogonal distances to the fitted hyperplanes based on the best data set.

Note

In order for this algorithm to produce meaningful results, the number of subclusters in the birch object should be in the hundreds, and even better, thousands.

Author(s)

Justin Harrington harringt@stat.ubc.ca and Matias Salibian-Barrera matias@stat.ubc.ca

References

Garcia-Escudero, L.A. and Gordaliza, A. and San Martin, R. and Van Aelst, S. and Zamar, R. (2007) “Robust Linear Clustering”, Unpublished Manuscript.

Harrington, J and Salibian-Barrera, M (2007) “Finding Approximate Solutions to Combinatorial Problems with Very Large Datasets using BIRCH”, submitted to Statistical Algorithms and Software, 2nd Special Issue Computational Statistics and Data Analysis. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch.pdf.

Harrington, J and Salibian-Barrera, M (2008) “birch: Working with very large data sets”, submitted to Journal of Statistical Software. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch-jss.pdf.

Van Aelst, S. and Wang, X. and Zamar, R.H. and Zhu, R. (2006) “Linear grouping using orthogonal regression”, Computational Statistics & Data Analysis 50, 1287–1312.

See Also

birch, and the non-birch algorithms lga and rlga

Examples

library(MASS) ## for mvrnorm
library(birch)

## Create new data set (that is more applicable to RLGA and LGA
set.seed(1234) 
x <- mvrnorm(1e4, mu=rep(0,2), Sigma=diag(c(0.25,1),2))
x <- rbind(x, mvrnorm(1e4, mu=rep(10,2),
                      Sigma=diag(c(5,0.5),2)))

## Create birch object, and save it
birchObj <- birch(x, 0.5)
length(birchObj)

library(birch)
rlgaOut <- rlga.birch(birchObj, k=2, 0.5)
plot(birchObj, col=rlgaOut$clust$sub+1)

lgaOut <- lga.birch(birchObj, k=2)
plot(birchObj, col=lgaOut$clust$sub)

[Package birch version 1.1-3 Index]