dissplot {seriation} | R Documentation |
Visualizes a dissimilarity matrix using seriation and matrix shading. Entries with lower dissimilarities (higher similarity) are plotted darker. Such a plot can be used to uncover hidden structure in the data.
The plot can also be used to visualize cluster quality (see Ling 1973). Objects belonging to the same cluster are displayed in consecutive order. The placement of clusters and the within cluster order is obtained by a seriation algorithm which tries to place large similarities/small dissimilarities close to the diagonal. Compact clusters are visible as dark squares (low dissimilarity) on the diagonal of the plot. Additionally, a Silhouette plot (Rousseeuw 1987) is added. This visualization is similar to CLUSION (see Strehl and Ghosh 2002), however, allows for using arbitrary sereating algorithms.
dissplot(x, labels = NULL, method = NULL, control = NULL, options = NULL)
x |
an object of class dist . |
labels |
NULL or an integer vector of the same length as
rows/columns in x indicating the cluster membership for each
object in x as consecutive integers starting with one. The labels
are used to reorder the matrix. |
method |
a list with up to three elements or a single character string.
Use a single character string to apply the same algorithm to reorder
the clusters (inter cluster seriation) as well as the objects within each
cluster (intra cluster seriation).
If separate algorithms for inter and intra cluster seriation are required, method can be a list of two named elements
(inter_cluster and intra_cluster each containing the name
of the respective seriation method. See seriate.dist for available
algorithms.
Set method to NA to plot the matrix as is (no or only coarse
seriation). For intra cluster reordering the special
method silhouette width is available. Objects in clusters are
then ordered by silhouette width (from silhouette plots).
If no method is given, the default method of seriate.dist
is used.
The third list element (named aggregation )
controls how inter cluster dissimilarities are computed from from
the given dissimilarity matrix. The choices are
"avg" (average pairwise dissimilarities; average-link),
"min" (minimal pairwise dissimilarities; single-link),
"max" (maximal pairwise dissimilarities; complete-link), and
"Hausdorff" (pairs up each point from one cluster with the most
similar point from the other cluster and then uses the largest
dissimilarity of paired up points).
|
control |
a list of control options passed on to the seriation
algorithm.
In case of two different seriation algorithms, control can
contain a list of two named elements (inter_cluster
and intra_cluster ) containing each a list with the control options
for the respective algorithm. |
options |
a list with options for plotting the matrix. The
list can contain the following elements:
|
An invisible object of class cluster_proximity_matrix
with the folowing
elements:
order |
NULL or integer vector giving the order used to plot
x . |
cluster_order |
NULL or integer vector giving the order
of the clusters as plotted. |
method |
vector of character strings indicating the seriation methods
used for plotting x . |
k |
NULL or integer scalar giving the number of clusters
generated. |
description |
a data.frame containing information (label, size,
average intra-cluster dissimilarity and the average silhouette) for the
clusters as displayed in the plot (from top/left to bottom/right). |
This object can be used for plotting via
plot(x, options = NULL, ...)
, where x
is the
object and options
contains a list with plotting options (see above).
Ling, R.F. (1973): A computer generated aid for cluster analysis. Communications of the ACM, 16(6), 355–361.
Rousseeuw, P.J. (1987): Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1), 53–65.
Strehl, A. and Ghosh, J. (2003): Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2), 208–230.
dist
(in package stats),
package grid and
seriate
.
data("iris") d <- dist(iris[-5]) ## plot original matrix res <- dissplot(d, method = NA) ## plot reordered matrix using the nearest insertion algorithm (from tsp) res <- dissplot(d, method = "tsp", options = list(main = "Seriation (TSP)")) ## cluster with pam (we know iris has 3 clusters) library("cluster") l <- pam(d, 3, cluster.only = TRUE) ## we use a grid layout to place several plots on a page grid.newpage() pushViewport(viewport(layout=grid.layout(nrow = 2, ncol = 2), gp = gpar(fontsize = 8))) pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1)) ## visualize the clustering res <- dissplot(d, l, method = "chen", options = list(main = "PAM + Seriation (Chen) - standard", newpage = FALSE)) popViewport() pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2)) ## more visualization options ## color: use 10 shades of blue (hue = 270) plot(res, options = list(main = "PAM + Seriation (Chen) - blue, only avg.", col= 10, hue=260, averages = c(TRUE, TRUE), newpage = FALSE)) popViewport() pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 1)) ## threshold and cubic scale to highlight differences plot(res, options = list(main = "PAM + Seriation (Chen) - threshold", threshold = 1.5, power = 3, newpage = FALSE)) popViewport() pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 2)) ## use custom (logistic) scale plot(res, options = list(main = "PAM + Seriation (Chen) - logistic scale", col= hcl(c = 0, l = (plogis(seq(0,10,length=100), location = 2, scale = 1/2, log = FALSE))*100), newpage = FALSE)) popViewport(2) ## the reordered_cluster_dissimilarity_matrix object res names(res)