HINoV.Symbolic {clusterSim} | R Documentation |
Modification of Heuristic Identification of Noisy Variables (HINoV) method for symbolic interval data
HINoV.Symbolic(x, u=NULL, distance="H", method = "pam", Index = "cRAND")
x |
symbolic interval data: a 3-dimensional table, first dimension represents object number, second dimension - variable number, and third dimension contains lower- and upper-bounds of intervals |
u |
number of clusters |
distance |
"M" - minimal distance between all vertices of hyper-cubes defined by symbolic interval variables; "H" - Hausdorff distance; "S" - sum of squares of distance between all vertices of hyper-cubes defined by symbolic interval variables |
method |
clustering method: "single", "ward", "complete", "average", "mcquitty", "median", "centroid", "pam" (default) |
Index |
"cRAND" - corrected Rand index (default); "RAND" - Rand index |
See file $R_HOME\library\clusterSim\pdf\HINoVSymbolic_details.pdf for further details
parim |
m x m symmetric matrix (m - number of variables). Matrix contains pairwise corrected Rand (Rand) indices for partitions formed by the j-th variable with partitions formed by the l-th variable |
topri |
sum of rows of parim |
stopri |
ranked values of topri in decreasing order |
Marek Walesiak Marek.Walesiak@ae.jgora.pl, Andrzej Dudek Andrzej.Dudek@ae.jgora.pl
Department of Econometrics and Computer Science, University of Economics, Wroclaw, Poland http://www.ae.jgora.pl/keii
Carmone, F.J., Kara, A., Maxwell, S. (1999), HINoV: a new method to improve market segment definition by identifying noisy variables, "Journal of Marketing Research", November, vol. 36, 501-509.
Hubert, L.J., Arabie, P. (1985), Comparing partitions, "Journal of Classification", no. 1, 193-218.
Rand, W.M. (1971), Objective criteria for the evaluation of clustering methods, "Journal of the American Statistical Association", no. 336, 846-850.
Walesiak, M., Dudek, A. (2007), Identification of noisy variables for nonmetric and symbolic data in cluster analysis, 31st Annual Conference of the German Classification Society (GfKl): Data Analysis, Machine Learning, and Applications (Freiburg, March, 7-9).
library(clusterSim) data(data_symbolic) r<- HINoV.Symbolic(data_symbolic, u=5) print(r$stopri) plot(r$stopri[,2], xlab="Variable number", ylab="topri", xaxt="n") axis(1,at=c(1:max(r$stopri[,1])),labels=r$stopri[,1]) #symbolic data from .csv file #library(clusterSim) #dsym<-as.matrix(read.csv2(file="csv/symbolic.csv")) #dim(dsym)<-c(dim(dsym)[1],dim(dsym)[2]%/%2,2) #r<- HINoV.Symbolic(dsym, u=5) #print(r$stopri) #plot(r$stopri[,2], xlab="Variable number", ylab="topri", xaxt="n") #axis(1,at=c(1:max(r$stopri[,1])),labels=r$stopri[,1])