dissplot {seriation}R Documentation

Dissimilarity Plot

Description

Visualizes a dissimilarity matrix using seriation and matrix shading. Entries with lower dissimilarities (higher similarity) are plotted darker. Such a plot can be used to uncover hidden structure in the data.

The plot can also be used to visualize cluster quality (see Ling 1973). Objects belonging to the same cluster are displayed in consecutive order. The placement of clusters and the within cluster order is obtained by a seriation algorithm which tries to place large similarities/small dissimilarities close to the diagonal. Compact clusters are visible as dark squares (low dissimilarity) on the diagonal of the plot. Additionally, a Silhouette plot (Rousseeuw 1987) is added. This visualization is similar to CLUSION (see Strehl and Ghosh 2002), however, allows for using arbitrary seriateing algorithms.

Usage

dissplot(x, labels = NULL, method = NULL, control = NULL, options = NULL)   

Arguments

x an object of class dist.
labels NULL or an integer vector of the same length as rows/columns in x indicating the cluster membership for each object in x as consecutive integers starting with one. The labels are used to reorder the matrix.
method a single character strings indicating the used seriation algorithm (NA to plot the matrix as is). The same algorithm is used to reorder the clusters (inter cluster seriation) as well as the objects within each cluster (intra cluster seriation).
If separate algorithms for inter and intra cluster seriation are required, method can be a list of two named elements (inter_cluster and intra_cluster each containing the name of the respective seriation method. See seriate.dist for available algorithms.
For intra cluster reordering the special method silhouette width is available. Objects in clusters are then ordered by silhouette width (the standard for silhouette plots).
If no method is given, the default method of seriate.dist is used.
control a list of control options passed on to the seriation algorithm. In case of two different seriation algorithms, control can contain a list of two named elements (inter_cluster and intra_cluster) containing each a list with the control options for the respective algorithm.
options a list with options for plotting the matrix. The list can contain the following elements:
plot
a logical indicating if a plot should be produced. if FALSE, the returned object can be plotted later using the function plot which takes as the second argument a list of plotting options (see options below).
cluster_labels
a logical indicating whether to display cluster labels in the plot.
averages
a logical indicating whether to display the average pair-wise dissimilarity between clusters instead of the individual dissimilarities in the lower triangle of the plot.
lines
a logical indicating whether to draw lines to separate clusters.
silhouettes
a logical indicating whether to include a silhouette plot (see Rousseeuw, 1987).
threshold
a numeric. If used, only plot distances below the threshold are displayed.
main
title for the plot.
col
colors used for the image plot (default: 100 shades of gray using the hcl colorspace with hcl(h = 0, c = 0, l = seq(20, 95, len = 100))). If col is a single number, it specifies the number of gray values used in the plot.
colorkey
a logical indicating whether to place a color key below the plot.
lines_col
color used for the lines to separate clusters.
newpage
a logical indicating whether to start plot on a new page (see grid.newpage in package grid).
pop
a logical indicating whether to pop the created viewports (see package grid)?
gp
an object of class gpar containing graphical parameters (see gpar in package grid).

Value

An invisible object of class cluster_proximity_matrix with the folowing elements:

order NULL or integer vector giving the order used to plot x.
cluster_order NULL or integer vector giving the order of the clusters as plotted.
method vector of character strings indicating the seriation methods used for plotting x.
k NULL or integer scalar giving the number of clusters generated.
description a data.frame containing information (label, size, average intra-cluster dissimilarity and the average silhouette) for the clusters as displayed in the plot (from top/left to bottom/right).


This object can be used for plotting via plot(x, options = NULL, ...), where x is the object and options contains a list with plotting options (see above).

References

Ling, R.F. (1973): A computer generated aid for cluster analysis. Communications of the ACM, 16(6), 355–361.

Rousseeuw, P.J. (1987): Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1), 53–65.

Strehl, A. and Ghosh, J. (2003): Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2), 208–230.

See Also

dist (in package stats), package grid and seriate.

Examples

data("iris")
d <- dist(iris[-5])

## plot original matrix
res <- dissplot(d, method = NA)

## plot reordered matrix using the nearest insertion algorithm (from tsp)
res <- dissplot(d, method = "tsp",
    options = list(main = "Seriation (TSP)"))

## cluster with pam (we know iris has 3 clusters)
library("cluster")
l <- pam(d, 3, cluster.only = TRUE)

## we use a grid layout to place several plots on a page
grid.newpage()
pushViewport(viewport(layout=grid.layout(nrow = 2, ncol = 2), 
    gp = gpar(fontsize = 8)))
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1))

## visualize the clustering
res <- dissplot(d, l, method = "chen",  
    options = list(main = "PAM + Seriation (Chen) - standard", 
    newpage = FALSE))

popViewport()
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2))

## more visualization options
## threshold
plot(res, options = list(main = "PAM + Seriation (Chen) - threshold", 
    threshold = 1.5, newpage = FALSE))

popViewport()
pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 1))

## color: use 10 shades of blue
plot(res, options = list(main = "PAM + Seriation (Chen) - blue", 
    col = hcl(h = 260, c = seq(75,0, length=10), l = seq(30,95, length=10)),
    gp = gpar(fill = hcl(h = 260, c=30, l = 80)), newpage = FALSE))

popViewport()
pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 2))

## supress average in lower triangle
plot(res, options = list(main = "PAM + Seriation (Chen) - no avg.", 
    average = FALSE, newpage = FALSE))

popViewport(2)

## the cluster_dissimilarity_matrix object
res 
names(res)

[Package seriation version 1.0-0 Index]