clustering {clues}R Documentation

Data Clustering (After Data Shrinking)

Description

Data clustering (after data shrinking).

Usage

clustering(y, disMethod = "Euclidean")

Arguments

y data matrix which is a R matrix object (for dimension > 1) or vector object (for dimension=1) with rows be observations and columns be variables.
disMethod specification of the dissimilarity measure. The available measures are “Euclidean” and “1-corr”.

Details

We first store the first observation (data point) in point[1]. We then get the nearest neighbor of point[1]. Store it in point[2]. Store the dissimilarity between point[1] and point[2] to db[1]. We next remove point[1]. We then find the nearest neighbor of point[2]. Store it in point[3]. Store the dissimilarity between point[2] and point[3] to db[2]. We then remove point[2] and find the nearest neighbor of point[3]. We repeat this procudure until we find point[n] and db[n-1] where n is the total number of data points.

Next, we calculate the interquartile range (IQR) of the vector db. We then check which elements of db are larger than avg+1.5IQR where avg is the average of the vector db. The mininum value of these outlier dissimilarities will be stored in omin. An estimate of the number of clusters is g where g-1 is the number of the outlier dissimilarities. The position of an outlier dissimilarity indicates the end of a cluster and the start of a new cluster.

To get a reasonable clustering result, data sharpening (shrinking) is recommended before data clustering.

Value

mem vector of the cluster membership of data points. The cluster member ship takes values: 1, 2, ..., g, where g is the estimated number of clusters.
size vector of the number of data points for clusters.
g an estimate of the number of clusters.
db vector of dissimilarities between consecutive data points (c.f. details).
point vector of consecutive data points (c.f. details).
omin The minimum value of the outlier dissimilarities (c.f. details).

References

Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.

See Also

shrinking

Examples

  # ruspini data
  data(Ruspini)
  # data matrix
  ruspini <- Ruspini$ruspini
  
  tt <- clustering(ruspini)
  plotClusters(ruspini, tt$mem)

[Package clues version 0.3.2 Index]