consensus {clue} | R Documentation |
Compute the consensus clustering of an ensemble of partitions or hierarchies.
cl_consensus(x, method = NULL, weights = 1, control = list())
x |
an ensemble of partitions or hierarchies, or something
coercible to that (see cl_ensemble ). |
method |
a character string specifying one of the built-in
methods for computing consensus clusterings, or a function to be
taken as a user-defined method, or NULL (default value). If
a character string, its lower-cased version is matched against the
lower-cased names of the available built-in methods using
pmatch . See Details for available built-in
methods and defaults. |
weights |
a numeric vector with non-negative case weights.
Recycled to the number of elements in the ensemble given by x
if necessary. |
control |
a list of control parameters. See Details. |
Consensus clusterings “synthesize” the information in the elements of a cluster ensemble into a single clustering, often by minimizing a criterion function measuring how dissimilar consensus candidates are from the (elements of) the ensemble (the so-called “optimization approach” to consensus clustering).
The most popular criterion functions are of the form L(x) = sum
w_b d(x_b, x)^p, where d is a suitable dissimilarity measure
(see cl_dissimilarity
), w_b is the case weight
given to element x_b of the ensemble, and p >= 1. If
p = 1 and minimization is over all possible base clusterings, a
consensus solution is called a median of the ensemble; if
minimization is restricted to the elements of the ensemble, a
consensus solution is called a medoid (see
cl_medoid
). For p = 2, we obtain least
squares consensus partitions and hierarchies (generalized means).
See also Gordon (1999) for more information.
If all elements of the ensemble are partitions, the built-in consensus methods compute soft least squares consensus partitions for Euclidean, GV1 and co-membership dissimilarities (i.e., minima of L(x) = sum w_b d(x_b, x)^2 over all soft partitions with k classes).
Available methods are as follows.
"DWH"
The following control parameters are available for this method.
k
order
"GV1"
The following control parameters are available for this method.
k
maxiter
reltol
sqrt(.Machine$double.eps)
.start
verbose
getOption("verbose")
.
"SE"
"GV1"
(which however is based on GV1 rather than Euclidean
dissimilarity). Available control parameters are the same as for
"GV1"
.
"GV3"
ls_fit_ultrametric
for more information on the SUMT approach.) This optimization
problem is equivalent to finding the membership matrix m for
which the sum of the squared differences between C(m) = m m'
and the weighted average co-membership matrix sum_b w_b
C(m_b) of the partitions is minimal.
Availabe control parameters are method
, control
,
eps
, q
, and verbose
, which have the same
roles as for ls_fit_ultrametric
, and the following.
k
start
By default, method "DWH"
is used for ensembles of partitions.
If all elements of the ensemble are hierarchies, the following built-in methods for computing consensus hierarchies are available.
"cophenetic"
ls_fit_ultrametric
on d with appropriate
control parameters."majority"
The fraction p can be specified via the control parameter
p
.
By default, method "cophenetic"
is used for ensembles of
hierarchies.
If a user-defined agreement method is to be employed, it must be a function taking the cluster ensemble, the case weights, and a list of control parameters as its arguments.
All built-in methods use heuristics for solving hard optimization problems, and cannot be guaranteed to find a global minimum. Standard practice would recommend to use the best solution found in “sufficiently many” replications of the methods.
The consensus partition or hierarchy.
E. Dimitriadou and A. Weingessel and K. Hornik (2002). A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence, 16, 901–912.
A. D. Gordon and M. Vichi (2001). Fuzzy partition models for fitting a set of partitions. Psychometrika, 66, 229–248.
A. D. Gordon (1999). Classification (2nd edition). Boca Raton, FL: Chapman & Hall/CRC.
T. Margush and F. R. McMorris (1981). Consensus n-trees. Bulletin of Mathematical Biology, 43, 239–244.
## Consensus partition for the Rosenberg-Kim kinship terms partition ## data based on co-membership dissimilarities. data("Kinship82") m1 <- cl_consensus(Kinship82, method = "GV3", control = list(k = 3, verbose = TRUE)) ## (Note that one should really use several replicates of this.) ## Value for criterion function to be minimized: sum(cl_dissimilarity(Kinship82, m1, "comem") ^ 2) ## Compare to the consensus solution given in Gordon & Vichi (2001). data("Kinship82_Consensus") m2 <- Kinship82_Consensus[["JMF"]] sum(cl_dissimilarity(Kinship82, m2, "comem") ^ 2) ## Seems we get a better solution ... ## How dissimilar are these solutions? cl_dissimilarity(m1, m2, "comem") ## How "fuzzy" are they? cl_fuzziness(cl_ensemble(m1, m2)) ## Do the "nearest" hard partitions fully agree? cl_dissimilarity(as.cl_hard_partition(m1), as.cl_hard_partition(m2)) ## Consensus partition for the Gordon and Vichi (2001) macroeconomic ## partition data based on Euclidean dissimilarities. data("GVME") set.seed(1) ## First, using k=2 classes. m1 <- cl_consensus(GVME, method = "GV1", control = list(k = 2, verbose = TRUE)) ## (Note that one should really use several replicates of this.) ## Value of criterion function to be minimized: sum(cl_dissimilarity(GVME, m1, "GV1") ^ 2) ## Compare to the consensus solution given in Gordon & Vichi (2001). data("GVME_Consensus") m2 <- GVME_Consensus[["MF1/2"]] sum(cl_dissimilarity(GVME, m2, "GV1") ^ 2) ## Seems we get a slightly better solution ... ## But note that cl_dissimilarity(m1, m2, "GV1") ## and that the maximal deviation of the memberships is max(abs(cl_membership(m1) - cl_membership(m2))) ## so the differences seem to be due to rounding. ## Do the "nearest" hard partitions fully agree? table(cl_class_ids(m1), cl_class_ids(m2)) ## And now for k=3 classes. m1 <- cl_consensus(GVME, method = "GV1", control = list(k = 3, verbose = TRUE)) sum(cl_dissimilarity(GVME, m1, "GV1") ^ 2) ## Compare to the consensus solution given in Gordon & Vichi (2001). m2 <- GVME_Consensus[["MF1/3"]] sum(cl_dissimilarity(GVME, m2, "GV1") ^ 2) ## This time we look much better ... ## How dissimilar are these solutions? cl_dissimilarity(m1, m2, "GV1") ## Do the "nearest" hard partitions fully agree? table(cl_class_ids(m1), cl_class_ids(m2))