replication.Mod {clusterSim}R Documentation

Modification of replication analysis for cluster validation

Description

Modification of replication analysis for cluster validation

Usage

replication.Mod(x, v="m", u=2, centrotypes="centers", 
        normalization=NULL, distance=NULL, method="kmeans", S=10)

Arguments

x data matrix
v type of data: metric ("r" - ratio, "i" - interval, "m" - mixed), nonmetric ("o" - ordinal, "n" - multi-state nominal, "b" - binary)
u number of clusters given arbitrary
centrotypes "centroids" or "medoids"
normalization optional, normalization formulas for metric data:
for ratio data: "n6" - (x/sd), "n7" - (x/range), "n8" - (x/max), "n9" - (x/mean), "n10" - (x/sum), "n11" - x/sqrt(SSQ)
for interval or mixed data: "n1" - (x-mean)/sd, "n2" - (x-Me)/MAD, "n3" - (x-mean)/range, "n4" - (x-min)/range, "n5" - (x-mean)/max[abs(x-mean)]
distance distance measures
NULL for "kmeans" method (based on data matrix),
for ratio data: "d1" - Manhattan, "d2" - Euclidean, "d3" - Chebychev (max), "d4" - squared Euclidean, "d5" - GDM1, "d6" - Canberra, "d7" - Bray-Curtis
for interval or mixed (ratio & interval) data: "d1", "d2", "d3", "d4", "d5"
for ordinal data: "d8" - GDM2
for multi-state nominal: "d9" - Sokal & Michener
for binary data: "b1" = Jaccard; "b2" = Sokal & Michener; "b3" = Sokal & Sneath (1); "b4" = Rogers & Tanimoto; "b5" = Czekanowski; "b6" = Gower & Legendre (1); "b7" = Ochiai; "b8" = Sokal & Sneath (2); "b9" = Phi of Pearson; "b10" = Gower & Legendre (2)
method clustering method: "kmeans" (default), "single", "complete", "average", "mcquitty", "median", "centroid", "ward", "pam"
S the number of simulations used to compute mean corrected Rand index

Details

See file $R_HOME\library\clusterSim\pdf\replication.Mod_details.pdf for further details

Value

A 3-dimensional array of A samples (first dimension represent iteration step, second - object number, third - variable number)
B 3-dimensional array of B samples (first dimension represent iteration step, second - object number, third - variable number)
centroid Array of matrices of class centroids for sample A (first dimension represent iteration step)
medoid Array of matrix of observations on u representative objects (medoids) for sample A (first dimension represent iteration step)
clusteringA Array of A={A_1,A_2,...,A_u} (first dimension represents iteration step, second - cluster number for each object)
clusteringB Array of B={B_1,B_2,...,B_u} (first dimension represents iteration step, second - cluster number for each object)
clusteringBB Array of BB={BB_1,BB_2,...,BB_u} (first dimension represents iteration step, second - cluster number for each object)
cRand value of mean corrected Rand index for S simulations

Author(s)

Marek Walesiak Marek.Walesiak@ae.jgora.pl, Andrzej Dudek Andrzej.Dudek@ae.jgora.pl

Department of Econometrics and Computer Science, University of Economics, Wroclaw, Poland http://www.ae.jgora.pl/keii

References

Breckenridge, J.N. (2000), Validating cluster analysis: consistent replication and symmetry, "Multivariate Behavioral Research", 35 (2), 261-285.

Gordon, A.D. (1999), Classification, Chapman and Hall/CRC, London.

Hubert, L., Arabie, P. (1985), Comparing partitions, "Journal of Classification", no. 1, 193-218.

Milligan, G.W. (1996), Clustering validation: results and implications for applied analyses, In P. Arabie, L.J. Hubert, G. de Soete (Eds.), Clustering and classification, World Scientific, Singapore, 341-375.

Walesiak, M. (2007), Ocena stabilnosci wynikow klasyfikacji z wykorzystaniem analizy replikacji, Prace Naukowe AE we Wroclawiu (in preparation).

See Also

cluster.Sim, hclust, kmeans, dist, dist.BC, dist.SM, dist.GDM, data.Normalization

Examples

library(clusterSim)
data(data_ratio)
w<-replication.Mod(data_ratio,u=5,S=10)
print(w)

[Package clusterSim version 0.30-7 Index]