ksample.e {energy}R Documentation

E-statistic (Energy Statistic) for Multivariate k-sample Test of Equal Distributions

Description

Returns the E-statistic (energy statistic) for the multivariate k-sample test of equal distributions.

Usage

 ksample.e(x, sizes, distance = FALSE, ix = 1:sum(sizes))

Arguments

x data matrix of pooled sample
sizes vector of sample sizes
distance logical: if TRUE, x is a distance matrix
ix a permutation of the row indices of x

Details

The k-sample multivariate E-statistic for testing equal distributions is returned. The statistic is computed from the original pooled samples, stacked in matrix x where each row is a multivariate observation, or from the distance matrix x of the original data. The first sizes[1] rows of x are the first sample, the next sizes[2] rows of x are the second sample, etc.

The two-sample E-statistic proposed by Szekely and Rizzo (2004) is the e-distance e(S_i,S_j), defined for two samples S_i, S_j of size n_i, n_j by

e(S_i, S_j) = (n_i n_j)(n_i+n_j)[2M_(ij)-M_(ii)-M_(jj)],

where

M_{ij} = 1/(n_i n_j) sum[1:n_i, 1:n_j] ||X_(ip) - X_(jq)||,

|| || denotes Euclidean norm, and X_(ip) denotes the p-th observation in the i-th sample. The k-sample E-statistic is defined by summing the pairwise e-distances over all k(k-1)/2 pairs of samples:

E = sum[i<j] e(S_i,S_j).

Large values of E are significant.

Value

The value of the multisample E-statistic corresponding to the permutation ix is returned.

Note

The pairwise e-distances between samples can be conveniently computed by the edist function, which returns a dist object. The function ksample.e computes the E-statistic only. For the test decision, a nonparametric bootstrap test (approximate permutation test) is provided by the function eqdist.etest. With the default arguments, ksample.e computes the statistic without storing the distance matrix. For the test statistic only, ksample.e is usually faster than calling eqdist.e, but for a permutation test the method of calculation in eqdist.etest computes the replicates much faster.

Author(s)

Maria L. Rizzo mrizzo @ bgnet.bgsu.edu and Gabor J. Szekely gabors @ bgnet.bgsu.edu

References

Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5).

Szekely, G. J. (2000) Technical Report 03-05: E-statistics: Energy of Statistical Samples, Department of Mathematics and Statistics, Bowling Green State University.

See Also

eqdist.etest edist energy.hclust

Examples

## compute 3-sample E-statistic for 4-dimensional iris data
 data(iris)
 ksample.e(iris[,1:4], c(50,50,50))

## compute a 3-sample univariate E-statistic
 ksample.e(rnorm(150), c(25,75,50))

[Package energy version 1.1-0 Index]