kmeansGap {SLmisc}R Documentation

k-means Clustering via gap statistic

Description

Perform k-means clustering on a data matrix where the number of clusters k is chosen via the gap statistic.

Usage

kmeansGap(x, iter.max = 10, nstart = 1, 
                algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"), 
                k.max = 20, M = 100, hclust = FALSE)
## S3 method for class 'clusterGap':
plot(x, ...)

Arguments

x A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). In case of plot.clusterGap the argument x is an object of class clusterGap.
iter.max The maximum number of iterations allowed in function kmeans.
nstart If centers is a number (i.e., hclust = FALSE), how many random sets should be chosen?
algorithm character, may be abbreviated.
k.max integer, maximum number of clusters.
M integer, number of Monte Carlo samples.
hclust logical, use hclust with method "average" to determine initial cluster centers.
... optional arguments to plot.clusterGap - not yet implemented

Details

For details on k-means clustering see kmeans. The function proceeds computing k-means clustering and the corresponding gap statistic Gap(k) for increasing number of clusters until Gap(k) - Gap(k+1) + s(k+1) >= 0 where s(k+1) is SE of simulation or k.max is reached, respectively.

Value

An object of class "clusterGap" which consist of the gap statistic as well as the clustering result.

Author(s)

Dr. Matthias Kohl (SIRS-Lab GmbH) kohl@sirs-lab.com

References

Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768-769.

Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100-108.

Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128-137.

MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281-297. Berkeley, CA: University of California Press.

T. Hastie, R. Tibshirani and G. Walther (2001). Estimating the number of data clusters via the Gap statistic. J.R. Statist. Soc. B, 63, pp. 411–423.

Tishirani, R., Walther, G. and Hastie, T. (2000) Estimating the number of clusters in a dataset via the Gap statistic. Technical Report. Stanford.

See Also

gap, kmeansGap

Examples

x <- rbind(matrix(rnorm(150, sd = 0.1), ncol= 3),
              matrix(rnorm(150, mean = 1, sd = 0.1), ncol = 3),
              matrix(rnorm(150, mean = 2, sd = 0.1), ncol = 3),
              matrix(rnorm(150, mean = 3, sd = 0.1), ncol = 3))

res <- kmeansGap(x = x, nstart = 10)
plot(res)

[Package SLmisc version 1.4.1 Index]