diffusionKmeans {diffusionMap}R Documentation

Diffusion K-means

Description

Clusters a data set based on its diffusion coordinates.

Usage

diffusionKmeans(dmap, K, params = c(), Niter = 50, epsilon = 0.001)

Arguments

dmap a '"dmap"' object, computed by diffusion()
K number of clusters
params optional parameters for each data point. Entry can be a vector of length n, or a matrix with n rows. If this argument is given, cluster centroid parameters are returned.
Niter number of K-means iterations performed.
epsilon stopping criterion for relative change in distortion for each K-means iteration

Details

A '"dmap"' object computed by diffuse() is the input, so diffuse() must be performed first. Function is written this way so the K-means parameters may be varied without having to recompute the diffusion map coordinates in each run.

Value

The returned value is a list with components

part final labelling of data from K-means. n-dimensional vector with integers between 1 and K
cent K geometric centroids found by K-means
D minimum of total distortion (loss function of K-means) found across K-means runs
DK n by k matrix of squared (Euclidean) distances from each point to every centroid for the optimal K-means run
centparams optional parameters for each centroid. Only returned if params is specified in the function call. Is a matrix with k rows.

Author(s)

Joseph Richards jwrichar@stat.cmu.edu

References

Lafon, S., & Lee, A., (2006), IEEE Trans. Pattern Anal. and Mach. Intel., 28, 1393

Richards, J. W., Freeman, P. E., Lee, A. B., Schafer, C. M., (2009), ApJ, 691, 32

See Also

diffuse,distortionMin

Examples

## example with annulus data set
data(annulus)
par(mfrow=c(2,1))
plot(annulus,main="Annulus Data",pch=20,cex=.7)
D = dist(annulus) # use Euclidean distance
dmap = diffuse(D,0.03) # compute diffusion map
k=2  # number of clusters
dkmeans = diffusionKmeans(dmap, k,Niter=25)
plot(annulus,main="Colored by diffusion K-means clustering",pch=20,
   cex=.7,col=dkmeans$part)

## example with Chainlink data set
data(Chainlink)
lab.col = c(rep("red",500),rep("blue",500)); n=1000
scatterplot3d(Chainlink$C1,Chainlink$C2,Chainlink$C3,color=lab.col,
   main="Chainlink Data") # plot Chainlink data
D = dist(Chainlink) # use Euclidean distance
dmap = diffuse(D,neigen=3,,eps.val=.01) # compute diffusion map & plot
plot(dmap)
print(dmap)
dkmeans = diffusionKmeans(dmap, K=2, Niter=25)
col.dkmeans=ifelse(dkmeans$part==1,"red","blue")
scatterplot3d(Chainlink$C1,Chainlink$C2,Chainlink$C3,color=col.dkmeans,
   main="Chainlink Data, colored by diffusion K-means classification")


[Package diffusionMap version 0.0-2 Index]