pclust {clue}R Documentation

Prototype-Based Partitions of Clusterings

Description

Compute prototype-based partitions of a cluster ensemble by minimizing sum u_{bj}^m d(x_b, p_j), the sum of the membership-weighted euclidean dissimilarities between the elements x_b of the ensemble and the prototypes p_j.

Usage

cl_pclust(x, k, m = 1, control = list())

Arguments

x an ensemble of partitions or hierarchies, or or something coercible to that (see cl_ensemble).
k an integer giving the number of classes to be used in the partition.
m a number not less than 1 controlling the softness of the partition (as the “fuzzification parameter” of the fuzzy c-means algorithm). The default value of 1 corresponds to hard partitions obtained from a generalized k-means problem; values greater than one give partitions of increasing softness obtained from a generalized fuzzy c-means problem.
control a list of control parameters. See Details.

Details

For m = 1, a generalization of the Lloyd-Forgy variant of the k-means algorithm is used, which iterates between reclassifying objects to their closest prototypes, and computing new prototypes as the class medians. This may result in degenerate solutions, and will be replaced by a Hartigan-Wong style algorithm eventually.

For m > 1, a generalization of the fuzzy c-means recipe (e.g., Bezdek (1981)) is used, which alternates between computing optimal memberships for fixed prototypes, and computing new prototypes as the class medians.

This procedure is repeated until convergence occurs, or the maximal number of iterations is reached.

Class medians are computed using cl_median.

Available control parameters are as follows.

maxiter
an integer giving the maximal number of iterations to be performed. Defaults to 100.
reltol
the relative convergence tolerance. Defaults to sqrt(.Machine$double.eps).
method
the method to be used in cl_median.
control
control parameters to be used in cl_median.

The fixed point approach employed is a heuristic which cannot be guaranteed to find the global minimum (as this is already true for the computation of cluster medians). Standard practice would recommend to use the best solution found in “sufficiently many” replications of the base algorithm.

Value

An object of class "cl_pclust" representing the obtained “secondary” partition, which is a list with the following components.

prototypes a cluster ensemble with the k prototypes.
membership an object of class "cl_membership" with the membership values u_{bj}.
cluster the class ids of the “nearest” hard partition.
silhouette Silhouette information for the partition, see silhouette.
validity precomputed validity measures for the partition.
m the softness control argument.

References

J. C. Bezdek (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum.

See Also

kmeans, cmeans.

Examples

## Use a precomputed ensemble of 50 k-means partitions of the
## Cassini data.
data("CKME")
CKME <- CKME[1 : 30]            # for saving precious time ...
diss <- cl_dissimilarity(CKME)
hc <- hclust(diss)
plot(hc)
## This suggests using a partition with three classes, which can be
## obtained using cutree(hc, 3).  Could use cl_median() to compute
## prototypes as the medians of the classes, or alternatively:
x1 <- cl_pclust(CKME, 3, m = 1)
x2 <- cl_pclust(CKME, 3, m = 2)
## Agreement of solutions.
cl_dissimilarity(x1, x2)
table(cl_class_ids(x1), cl_class_ids(x2))

[Package clue version 0.1-0 Index]