gapStat {SLmisc} | R Documentation |
Calculates a goodness of clustering measure based on the average dispersion compared to a reference distribution.
gapStat(data, class = rep(1, nrow(data)), M = 500)
data |
matrix or data.frame, data |
class |
a vector describing the cluster memberships of the rows of data |
M |
integer, number of Monte Carlo samples |
This function is based on the function gap
of package "SAGx"
.
vector with components "gap statistic"
and "SE of simulation"
.
Dr. Matthias Kohl (SIRS-Lab GmbH) kohl@sirs-lab.com
T. Hastie, R. Tibshirani and G. Walther (2001). Estimating the number of data clusters via the Gap statistic. J.R. Statist. Soc. B, 63, pp. 411–423.
Tibshirani, R., Walther, G. and Hastie, T. (2000). Estimating the number of clusters in a dataset via the Gap statistic. Technical Report. Stanford.
Per Broberg (2006). SAGx: Statistical Analysis of the GeneChip. R package version 1.9.7. http://home.swipnet.se/pibroberg/expression_hemsida1.html
x <- rbind(matrix(rnorm(150, sd = 0.1), ncol= 3), matrix(rnorm(150, mean = 1, sd = 0.1), ncol = 3), matrix(rnorm(150, mean = 2, sd = 0.1), ncol = 3), matrix(rnorm(150, mean = 3, sd = 0.1), ncol = 3)) gap.stat <- matrix(NA, ncol = 2, nrow = 9) for(i in 2:10){ cl <- kmeans(x, i) gap.stat[i-1, ] <- gapStat(x, cl$clust, M = 100) } ## choose cluster size to be the smallest value such that the following ## is positive (res <- gap.stat[1:8,1] - gap.stat[2:9,1] + gap.stat[2:9,2]) min(c(2:9)[res >= 0])