index.Gap {clusterSim}R Documentation

Calculates Tibshirani, Walther and Hastie gap index

Description

Calculates Tibshirani, Walther and Hastie gap index

Usage

index.Gap (x, clall, reference.distribution="unif", B=10, 
        method="pam")

Arguments

x data
clall Two vectors of integers indicating the cluster to which each object is allocated in partition of n objects into u, and u+1 clusters
reference.distribution "unif" - generate each reference variable uniformly over the range of the observed values for that variable or "pc" - generate the reference variables from a uniform distribution over a box aligned with the principal components of the data. In detail, if $X={x_{ij}}$ is our n x m data matrix, assume that the columns have mean 0 and compute the singular value decomposition $X=UDV^T$. We transform via $X'=XV$ and then draw uniform features Z' over the ranges of the columns of X' , as in method a) above. Finally we back-transform via $Z=Z'V^T$ to give reference data Z
B the number of simulations used to compute the gap statistic
method the cluster analysis method to be used. This should be one of: "ward", "single", "complete", "average", "mcquitty", "median", "centroid", "pam", "k-means"

Details

See file $R_HOME\library\clusterSim\pdf\indexGap_details.pdf for further details

Value

Gap Tibshirani, Walther and Hastie gap index for u clusters
diffu necessary value for choosing correct number of clusters via gap statistic Gap(u)-[Gap(u+1)-s(u+1)]

Author(s)

Marek Walesiak Marek.Walesiak@ae.jgora.pl, Andrzej Dudek Andrzej.Dudek@ae.jgora.pl

Department of Econometrics and Computer Science, University of Economics, Wroclaw, Poland http://www.ae.jgora.pl/keii

References

Tibshirani, R., Walther, G., Hastie, T. (2001), Estimating the number of clusters in a data set via the gap statistic, "Journal of the Royal Statistical Society", ser. B, vol. 63, part 2, 411-423.

See Also

index.G1, index.G2, index.G3, index.S, index.H, index.KL

Examples

library(clusterSim)
data(data_ratio)
cl1<-pam(data_ratio,4)
cl2<-pam(data_ratio,5)
clall<-cbind(cl1$clustering,cl2$clustering)
g<-index.Gap(data_ratio, clall, reference.distribution="unif", B=10,
   method="pam")
print(g)

[Package clusterSim version 0.30-7 Index]