nncluster {nnclust} | R Documentation |
Uses Prim's algorithm to build a minimum spanning tree for each
cluster, stopping when the nearest-neighbour distance rises above a
specified threshold. Returns a set of clusters and a set of 'outliers'
not in any cluster. trimCluster
tidies up the output by removing
small clusters, clusterMember
returns cluster membership for the
original data points.
nncluster(x, threshold, fill = 0.95, maxclust = 20, give.up = 500,verbose=FALSE,start=NULL) trimCluster(nnclust, size=10) clusterMember(nnclust, outlier=TRUE) nearestCluster(nnclust, threshold=Inf,outlier=FALSE)
x |
data matrix |
threshold |
Threshold for stopping the tree building within a cluster. The tree
stops when the squared euclidean distance to the closest point to the
tree is greater than this. If threshold is a vector, the elements
will be used in succession, with the last element repeated as necessary.
|
fill |
Stop when the clusters make up this fraction of the data. |
maxclust |
Stop at this many clusters |
give.up |
Stop when fewer than this many pairs have nearest-neighbour distance
less than threshold .
|
verbose |
Print some cluster summaries before restarting? |
nnclust |
An object of class nncluster , returned by nncluster
|
size |
Clusters smaller than this are added to the 'outlier' set |
outlier |
If FALSE , use NA for the cluster identifier
for outliers |
start |
integer index to start the minimum spanning tree at this observation |
Works best for well-separated clusters in up to 8 dimensions, and sample sizes up to hundreds of thousands.
If you want a complete minimum spanning tree, run mst
on the
outlier set and then use nnfind
to find the shortest links
connecting the clusters. When there are well-separated clusters this
will be faster than running mst
once on the whole data set.
clusterMember
returns a vector of integers indicating cluster membership. Outliers are treated as a separate cluster if outlier
is TRUE
, otherwise they code as NA
. nearestCluster
assigns outliers at distance less than threshold
from a cluster to the cluster whose nearest member is closest.
trimCluster
returns a new nncluster
object with small clusters converted to outliers. There must be at least one cluster larger than size
.
A list of class nncluster
. Each element but the last
describes a cluster, with components mst
containing the tree,
x
containing the data, and rows
containing row numbers in
the initial data set.
The last element describes the unclustered outliers and has no
mst
component.
The performance of this algorithm depends critically on the performance of the nearest-neighbour finder, and can decay catastrophically if too many uninformative variables are added.
The performance can also be poor if the data are close to being ordered on some of the variables.
Thomas Lumley
x<-scale(faithful) a<-nncluster(x, threshold=0.1, give.up=0, fill=1) a id<-clusterMember(a) plot(faithful, col=id, pch=19)