edist {energy} | R Documentation |
Returns the E-distances (energy statistics) between clusters.
edist(x, sizes, distance = FALSE, ix = 1:sum(sizes), alpha = 1)
x |
data matrix of pooled sample or Euclidean distances |
sizes |
vector of sample sizes |
distance |
logical: if TRUE, x is a distance matrix |
ix |
a permutation of the row indices of x |
alpha |
distance exponent |
A vector containing the pairwise two-sample multivariate
E-statistics for comparing clusters or samples is returned.
The e-distance between clusters is computed from the original pooled data,
stacked in matrix x
where each row is a multivariate observation, or
from the distance matrix x
of the original data, or distance object
returned by dist
. The first sizes[1]
rows of the original data
matrix are the first sample, the next sizes[2]
rows are the second
sample, etc. The permutation vector ix
may be used to obtain
e-distances corresponding to a clustering solution at a given level in
the hierarchy.
The e-distance between two clusters C_i, C_j of size n_i, n_j proposed by Szekely and Rizzo (2003) is the e-distance e(C_i,C_j), defined by
e(S_i, S_j) = (n_i n_j)(n_i+n_j)[2M_(ij)-M_(ii)-M_(jj)],
where
M_{ij} = 1/(n_i n_j) sum[1:n_i, 1:n_j] ||X_(ip) - X_(jq)||^a,
|| || denotes Euclidean norm, a=
alpha
, and X_(ip) denotes the p-th observation in the i-th cluster. The
exponent alpha
should be in the interval (0,2].
A object of class dist
containing the lower triangle of the
e-distance matrix of cluster distances corresponding to the permutation
of indices ix
is returned.
Maria L. Rizzo mrizzo @ bgnet.bgsu.edu and Gabor J. Szekely gabors @ bgnet.bgsu.edu
Szekely, G. J. and Rizzo, M. L. (2005) Hierarchical Clustering
via Joint Between-Within Distances: Extending Ward's Minimum
Variance Method, Journal of Classification 22(2) 151-183.
http://dx.doi.org/10.1007/s00357-005-0012-9
Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5).
Szekely, G. J. (2000) Technical Report 03-05, E-statistics: Energy of Statistical Samples, Department of Mathematics and Statistics, Bowling Green State University.
energy.hclust
eqdist.etest
ksample.e
## compute e-distances for 3 samples of iris data data(iris) edist(iris[,1:4], c(50,50,50)) ## compute e-distances from vector of group labels d <- dist(matrix(rnorm(100), nrow=50)) g <- cutree(energy.hclust(d), k=4) edist(d, sizes=table(g), ix=rank(g, ties.method="first"))