agreement {clue}R Documentation

Agreement Between Partitions or Hierarchies

Description

Compute the agreement between (ensembles) of partitions or hierarchies.

Usage

cl_agreement(x, y = NULL, method = "euclidean")

Arguments

x an ensemble of partitions or hierarchies, or something coercible to that (see cl_ensemble).
y NULL (default), or as for x.
method a character string specifying one of the built-in methods for computing agreement, or a function to be taken as a user-defined method. If a character string, its lower-cased version is matched against the lower-cased names of the available built-in methods using pmatch. See Details for available built-in methods.

Details

If y is given, its components must be of the same kind as those of x (i.e., components must either all be partitions, or all be hierarchies).

If all components are partitions, the following built-in methods for measuring agreement between two partitions with respective membership matrices u and v (brought to a common number of columns) are available:

"euclidean"
1 - d / m, where d is the Euclidean dissimilarity of the memberships, i.e., the square root of the minimal sum of the squared differences of u and all column permutations of v, and m is an upper bound for the maximal Euclidean dissimilarity. See Dimitriadou, Weingessel and Hornik (2002).
"manhattan"
1 - d / m, where d is the Manhattan dissimilarity of the memberships, i.e., the minimal sum of the absolute differences of u and all column permutations of v, and m is an upper bound for the maximal Manhattan dissimilarity.
"Rand"
The Rand index (the rate of distinct pairs of objects both in the same class or both in different classes in both partitions), see Rand (1971) or Gordon (1999), page 198. For soft partitions, (currently) the Rand index of the corresponding “nearest” hard partitions is used.
"cRand"
The Rand index corrected for agreement by chance, see Hubert and Arabie (1985) or Gordon (1999), page 198. Can only be used for hard partitions.
"NMI"
Normalized Mutual Information, see Strehl and Ghosh (2002). For soft partitions, (currently) the NMI of the corresponding “nearest” hard partitions is used.
"KP"
The Katz-Powell index, i.e., the product-moment correlation coefficient between the elements of the co-membership matrices C(u) = u u' and C(v), respectively, see Katz and Powell (1953). For soft partitions, (currently) the Katz-Powell index of the corresponding “nearest” hard partitions is used. (Note that for hard partitions, the (i,j) entry of C(u) is one iff objects i and j are in the same class.)
"angle"
The maximal cosine of the angle between the elements of u and all column permutations of v.
"diag"
The maximal co-classification rate, i.e., the maximal rate of objects with the same class ids in both partitions after arbitrarily permuting the ids.

If all components are hierarchies, available built-in methods for measuring agreement between two hierarchies with respective ultrametrics u and v are as follows.

"euclidean"
1 / (1 + d), where d is the Euclidean dissimilarity of the ultrametrics (i.e., the square root of the sum of the squared differences of u and v).
"manhattan"
1 / (1 + d), where d is the Manhattan dissimilarity of the ultrametrics (i.e., the sum of the absolute differences of u and v).
"cophenetic"
The cophenetic correlation coefficient. (I.e., the product-moment correlation of the ultrametrics.)
"angle"
The cosine of the angle between the ultrametrics.
"gamma"
1 - d, where d is the rate of inversions between the associated ultrametrics (i.e., the rate of pairs (i,j) and (k,l) for which u_{ij} < u_{kl} and v_{ij} > v_{kl}). (This agreement measure is a linear transformation of Kruskal's gamma.)

If a user-defined agreement method is to be employed, it must be a function taking two clusterings as its arguments.

Symmetric agreement objects of class "cl_agreement" are implemented as symmetric proximity objects with self-proximities identical to one, and inherit from class "cl_proximity". They can be coerced to dense square matrices using as.matrix. It is possible to use 2-index matrix-style subscripting for such objects; unless this uses identical row and column indices, this results in a (non-symmetric agreement) object of class "cl_cross_agreement".

Value

If y is NULL, an object of class "cl_agreement" containing the agreements between the all pairs of components of x. Otherwise, an object of class "cl_cross_agreement" with the agreements between the components of x and the components of y.

References

E. Dimitriadou and A. Weingessel and K. Hornik (2002). A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence, 16, 901–912.

A. D. Gordon (1999). Classification (2nd edition). Boca Raton, FL: Chapman & Hall/CRC.

L. Hubert and P. Arabie (1985). Comparing partitions. Journal of Classification, 2, 193–218.

W. M. Rand (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.

L. Katz and J. H. Powell (1953). A proposed index of the conformity of one sociometric measurement to another. Psychometrika, 18, 249–256.

A. Strehl and J. Ghosh (2002). Cluster ensembles — A knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research, 3, 583–617.

See Also

cl_dissimilarity; classAgreement in package e1071.

Examples

## An ensemble of partitions.
data("CKME")
pens <- CKME[1 : 20]            # for saving precious time ...
summary(c(cl_agreement(pens)))
summary(c(cl_agreement(pens, method = "Rand")))
summary(c(cl_agreement(pens, method = "diag")))
cl_agreement(pens[1:5], pens[6:7], method = "NMI")
## Equivalently, using subscripting.
cl_agreement(pens, method = "NMI")[1:5, 6:7]

## An ensemble of hierarchies.
d <- dist(USArrests)
hclust_methods <- c("ward", "single", "complete", "average",
                    "mcquitty", "median", "centroid")
hclust_results <- lapply(hclust_methods, function(m) hclust(d, m))
hens <- cl_ensemble(list = hclust_results)
names(hens) <- hclust_methods 
summary(c(cl_agreement(hens)))
summary(c(cl_agreement(hens, method = "cophenetic")))
cl_agreement(hens[1:3], hens[4:5], method = "gamma")

[Package clue version 0.2-3 Index]