kmeans.birch {birch} | R Documentation |
Perform the k-means clustering algorithm using a birch object.
kmeans.birch(birchObject, centers, nstart = 1)
birchObject |
An object created by the function birch . |
centers |
Either the number of clusters or a set of initial (distinct) cluster centres. If a number, a random set of (distinct) subclusters in ‘birchObject’ is chosen as the initial centres. |
nstart |
If 'centers' is a number, how many random sets should be chosen? |
The birch object given by ‘birchObject’ is clustered by the k-means method, adjusted for dealing with birch objects. The aim is to partition the subclusters into k groups such that the sum of squares of all the points in each subcluster to the assigned cluster centers is minimized.
The result should be approximately similar to that found by performing k-means on the original data set. However, this approximation depends on the “coarseness” of the underlying tree, and the size of the combinatorial problem. These aspect as discussed in detail in the references.
Returns a list with components:
RSS |
The total residual sum-of-squares of the clustering. |
clust |
A list containing a vector of which subclusters make up the clustering (sub) and a vector with the underlying observations that make up the clusters (obs) |
In order for this algorithm to produce meaningful results, the number of subclusters in the birch object should number in the hundreds, and even better, thousands.
Justin Harrington harringt@stat.ubc.ca and Matias Salibian-Barrera matias@stat.ubc.ca
Harrington, J and Salibian-Barrera, M (2007) “Finding Approximate Solutions to Combinatorial Problems with Very Large Datasets using BIRCH”, submitted to Statistical Algorithms and Software, 2nd Special Issue Computational Statistics and Data Analysis. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch.pdf.
Harrington, J and Salibian-Barrera, M (2008) “birch: Working with very large data sets”, submitted to Journal of Statistical Software. A draft can be found at http://www.stat.ubc.ca/~harringt/birch/birch-jss.pdf.
## Load a demo birch Object data(birchObj) ## Perform k-means, specifying the number of clusters kOut <- kmeans.birch(birchObj, 2, nstart=10) ## Perform k-means, specifying the initial cluster centers ## See dist.clust for one method of initial cluster centers kOut <- kmeans.birch(birchObj, matrix(c(0,10), ncol=5, nrow=2)) ## To plot using the birch object plot(birchObj, col=kOut$clust$sub) ## To plot using the underlying data (if available) ## Not run: plot(x, col=kOut$clust$obs)