cluster.stats {fpc}R Documentation

Cluster validation statistics

Description

Computes a number of distance based statistics which can be used for cluster validation, comparison between clusterings and decision about the number of clusters: cluster sizes, cluster diameters, average distances within and between clusters, cluster separation, average silhouette widths, the best distance based statistics to decide about the number of clusters in a study of Milligan and Cooper (1985), Hubert's gamma coefficient, the Dunn index and the corrected rand index to assess the similarity of two clusterings.

Usage

cluster.stats(d,clustering,alt.clustering=NULL,
                          silhouette=TRUE,G2=FALSE,G3=FALSE)

Arguments

d a distance object (as generated by dist) or a distance matrix between cases.
clustering an integer vector of length of the number of cases, which indicates a clustering. The clusters have to be numbered from 1 to the number of clusters.
alt.clustering an integer vector such as for clustering, indicating an alternative clustering. If provided, the corrected rand index for clustering vs. alt.clustering is computed.
silhouette logical. If TRUE, the silhouette statistics are computed, which requires package cluster.
G2 logical. If TRUE, Goodman and Kruskal's index G2 (cf. Gordon (1999), p. 62) is computed. This executes lots of sorting algorithms and can be very slow (it has been improved by R. Francois - thanks!)
G3 logical. If TRUE, the index G3 (cf. Gordon (1999), p. 62) is computed. This executes sort on all distances and can be extremely slow.

Value

cluster.stats returns a list containing the components n, cluster.number, cluster.size, diameter, average.distance, median.distance, separation, average.toother, separation.matrix, average.between, average.within, n.between, n.within, clus.avg.silwidths, avg.silwidth, g2, g3, hubertgamma, dunn, wb.ratio, corrected.rand.

n number of cases.
cluster.number number of clusters.
cluster.size vector of cluster sizes (number of points).
diameter vector of cluster diameters (maximum within cluster distances).
average.distance vector of clusterwise within cluster average distances.
median.distance vector of clusterwise within cluster distance medians.
separation vector of clusterwise minimum distances of a point in the cluster to a point of another cluster.
average.toother vector of clusterwise average distances of a point in the cluster to the points of other clusters.
separation.matrix matrix of separation values between all pairs of clusters.
average.between average distance between clusters.
average.within average distance within clusters.
n.between number of distances between clusters.
n.within number of distances within clusters.
clus.avg.silwidths vector of cluster average silhouette widths. See silhouette.
avg.silwidth average silhouette width. See silhouette.
g2 Goodman and Kruskal's Gamma coefficient. See Milligan and Cooper (1985), Gordon (1999, p. 62).
g3 G3 coefficient. See Gordon (1999, p. 62).
hubertgamma correlation between distances and a 0-1-vector where 0 means same cluster, 1 means different clusters. See Haldiki et al. (2002).
dunn minimum separation / maximum diameter. Dunn index, see Haldiki et al. (2002).
wb.ratio average.within/average.between.
corrected.rand corrected rand index (if alt.clustering has been specified), see Gordon (1999, p. 198).

Author(s)

Christian Hennig chrish@stats.ucl.ac.uk http://www.homepages.ucl.ac.uk/~ucakche/

References

Gordon, A. D. (1999) Classification, 2nd ed. Chapman and Hall.

Haldiki, M., Batistakis, Y., Vazirgiannis, M. (2002) Cluster validity methods, SIGMOD, Record 31, 40-45.

Milligan, G. W. and Cooper, M. C. (1985) An examination of procedures for determining the number of clusters. Psychometrika, 50, 159-179.

See Also

silhouette, dist clusterboot computes clusterwise stability statistics by resampling.

Examples

  
  set.seed(20000)
  face <- rFace(200,dMoNo=2,dNoEy=0,p=2)
  dface <- dist(face)
  complete3 <- cutree(hclust(dface),3)
  cluster.stats(dface,complete3,
                alt.clustering=as.integer(attr(face,"grouping")))
  

[Package fpc version 1.2-3 Index]