distance correlation {energy} | R Documentation |
Computes distance covariance and distance correlation statistics, which are multivariate measures of dependence.
dcov(x, y, index = 1.0) dcor(x, y, index = 1.0) DCOR(x, y, index = 1.0)
x |
matrix: first sample, observations in rows |
y |
matrix: second sample, observations in rows |
index |
exponent on Euclidean distance, in (0,2] |
dcov
and dcor
or DCOR
compute distance
covariance and distance correlation statistics.
DCOR
is a self-contained R function returning a list of
statistics. dcor
execution is faster than DCOR
(see examples).
The sample sizes (number of rows) of the two samples must agree, and samples must not contain missing values.
Distance correlation is a new measure of dependence between random vectors introduced by Szekely, Rizzo, and Bakirov (2007). For all distributions with finite first moments, distance correlation R generalizes the idea of correlation in two fundamental ways: (1) R(X,Y) is defined for X and Y in arbitrary dimension. (2) R(X,Y)=0 characterizes independence of X and Y.
Distance correlation satisfies 0 <= R <= 1, and R = 0 only if X and Y are independent. Distance covariance V provides a new approach to the problem of testing the joint independence of random vectors. The formal definitions of the population coefficients V and R are given in (SRB 2007). The definitions of the empirical coefficients are as follows.
The empirical distance covariance V_n(X,Y) with index 1 is the nonnegative number defined by
V^2_n (X,Y) = (1/n^2) sum_{k,l=1:n} A_{kl}B_{kl}
where A_{kl} and B_{kl} are
A_{kl} = a_{kl}-bar a_{k.}- bar a_{.l} + bar a_{..}
B_{kl} = b_{kl}-bar b_{k.}- bar b_{.l} + bar b_{..}.
Here
a_{kl} = ||X_k - X_l||_p, b_{kl} = ||Y_k - Y_l||_q, k,l=1,...,n,
and the subscript .
denotes that the mean is computed for the
index that it replaces. Similarly,
V_n(X) is the nonnegative number defined by
V^2_n (X) = V^2_n (X,X) = (1/n^2) sum_{k,l=1:n} A_{kl}^2.
The empirical distance correlation R(mathbf{X,Y}) is the square root of
R^2_n(X,Y)= V^2_n(X,Y) / sqrt(V^2_n (X) V^2_n(Y)).
See dcov.test
for a test of multivariate independence
based on the distance covariance statistic.
dcov
returns the sample distance covariance and
dcor
returns the sample distance correlation.
DCOR
returns a list with elements
dCov |
sample distance covariance |
dCor |
sample distance correlation |
dVarX |
distance variance of x sample |
dVarY |
distance variance of y sample |
Two methods of computing the statistics are provided. DCOR
is a stand-alone R function that returns a list of statistics.
dcov
and dcor
provide R interfaces to the C
implementation, which is usually faster. dcov
and dcor
call an internal function .dcov
.
Note that it is inefficient to compute dCor by:
square root of
dcov(x,y)/sqrt(dcov(x,x)*dcov(y,y))
because the individual
calls to dcov
involve unnecessary repetition of calculations.
For this reason, both .dcov
and DCOR
compute and
return all four statistics.
Maria L. Rizzo mrizzo@bgnet.bgsu.edu and Gabor J. Szekely gabors@bgnet.bgsu.edu
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
http://dx.doi.org/10.1214/009053607000000505
## independent multivariate data x <- matrix(rnorm(30), nrow=20, ncol=3) y <- matrix(rnorm(40), nrow=20, ncol=2) ## C implementation dcov(x, y, 1.5) dcor(x, y, 1.5) .dcov(x, y, 1.5) ## R implementation DCOR(x, y, 1.5) ## Not run: ## compare speed of R version and C version set.seed(111) ## R version system.time(replicate(1000, DCOR(x, y))) set.seed(111) ## C version system.time(replicate(1000, .dcov(x, y))) ## End(Not run)