wcls/bcls.matrix {clv}R Documentation

Matrix Cluster Scatter Measures

Description

Functions compute two base matrix cluster scatter measures.

Usage

wcls.matrix(data,clust,cluster.center)
bcls.matrix(cluster.center,cluster.size,mean)

Arguments

data numeric matrix or data.frame where columns correspond to variables and rows to observations
clust integer vector with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning.
cluster.center matrix or data.frame where columns correspond to variables and rows to cluster centers defined by data and clust parameters.
cluster.size integer vector with information about size of each cluster computed using clust vector.
mean mean of all data objects.

Details

There are two base matrix scatter measures.

1. within-cluster scatter measure defined as:

W = sum(forall k in 1:cluster.num) W(k)

where W(k) = sum(forall x) (x - m(k))*(x - m(k))'

x - object belongs to cluster k,
m(k) - center of cluster k.

2. between-cluster scatter measure defined as:

B = sum(forall k in 1:cluster.num) |C(k)|*( m(k) - m )*( m(k) - m )'

|C(k)| - size of cluster k,
m(k) - center of cluster k,
m - center of all data objects.

Value

wcls.matrix returns W matrix (within-cluster scatter measure),
bcls.matrix returns B matrix (between-cluster scatter measure).

Author(s)

Lukasz Nieweglowski

References

T. Hastie, R. Tibshirani, G. Walther Estimating the number of data clusters via the Gap statistic, http://citeseer.ist.psu.edu/tibshirani00estimating.html

Examples

# load and prepare data
library(clv)
data(iris)
iris.data <- iris[,1:4]

# cluster data
pam.mod <- pam(iris.data,5) # create five clusters
v.pred <- as.integer(pam.mod$clustering) # get cluster ids associated to gived data objects

# compute cluster sizes, center of each cluster 
# and mean from data objects
cls.attr <- cls.attrib(iris.data, v.pred)
center <- cls.attr$cluster.center
size <- cls.attr$cluster.size
iris.mean <- cls.attr$mean

# compute matrix scatter measures
W.matrix <- wcls.matrix(iris.data, v.pred, center)
B.matrix <- bcls.matrix(center, size, iris.mean)
T.matrix <- W.matrix + B.matrix

# example of indicies based on W, B i T matricies
mx.scatt.crit1 = sum(diag(W.matrix))
mx.scatt.crit2 = sum(diag(B.matrix))/sum(diag(W.matrix))
mx.scatt.crit3 = det(W.matrix)/det(T.matrix)

[Package clv version 0.2 Index]