edist {energy} | R Documentation |
Returns the E-distances (energy statistics) between clusters.
edist(x, sizes, distance = FALSE, ix = 1:sum(sizes), alpha = 1)
x |
data matrix of pooled sample or Euclidean distances |
sizes |
vector of sample sizes |
distance |
logical: if TRUE, x is a distance matrix |
ix |
a permutation of the row indices of x |
alpha |
distance exponent |
A vector containing the pairwise two-sample multivariate
E-statistics for comparing clusters or samples is returned.
The e-distance between clusters is computed from the original pooled data,
stacked in matrix x
where each row is a multivariate observation, or
from the distance matrix x
of the original data, or distance object
returned by dist
. The first sizes[1]
rows of the original data
matrix are the first sample, the next sizes[2]
rows are the second
sample, etc. The permutation vector ix
may be used to obtain
e-distances corresponding to a clustering solution at a given level in
the hierarchy.
The e-distance between two clusters C_i, C_j of size n_i, n_j proposed by Szekely and Rizzo (2003) is the e-distance e(C_i,C_j), defined by
e(S_i, S_j) = (n_i n_j)(n_i+n_j)[2M_(ij)-M_(ii)-M_(jj)],
where
M_{ij} = 1/(n_i n_j) sum[1:n_i, 1:n_j] ||X_(ip) - X_(jq)||^a,
|| || denotes Euclidean norm, a=
alpha
, and X_(ip) denotes the p-th observation in the i-th cluster. The
exponent alpha
should be in the interval (0,2].
A object of class dist
containing the lower triangle of the
e-distance matrix of cluster distances corresponding to the permutation
of indices ix
is returned.
Maria L. Rizzo rizzo@math.ohiou.edu and Gabor J. Szekely gabors@bgnet.bgsu.edu
Szekely, G. J. and Rizzo, M. L. (2005) Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method, Journal of Classification 22(2) (in press).
Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5).
Szekely, G. J. (2000) Technical Report 03-05, E-statistics: Energy of Statistical Samples, Department of Mathematics and Statistics, Bowling Green State University.
energy.hclust
eqdist.etest
ksample.e
## compute e-distances for 3 samples of iris data data(iris) edist(iris[,1:4], c(50,50,50)) ## compute e-distances from vector of group labels d <- dist(matrix(rnorm(100), nrow=50)) g <- cutree(energy.hclust(d), k=4) edist(d, sizes=table(g), ix=rank(g, ties.method="first"))