Distance correlation

Template:New unreviewed article

Distance correlation in statistics and in probability theory is a new measure of dependence between two random variables. The classical measure of dependence, the Pearson correlation coefficient, is mainly sensitive to a linear relationship between two variables. Distance correlation was introduced in 2005 by Gabor J Szekely in several lectures to address this deficiency of Pearson’s correlation, namely that it can easily be zero for dependent variables. Correlation = 0 (uncorrelatedness) does not imply independence while distance correlation = 0 does imply independence. The first results on distance correlation were published in 2007 and 2009. It was proved that distance correlation is the same as Brownian correlation.

Definition

Distance correlation of two random variables is obtained by dividing their Distance Covariance by the product of their Distance standard deviations. The Distance correlation is denoted by dcor(X,Y). The empirical distance correlation is denoted by dcor_n(X,Y).

For easy computation of distance correlation see dcov in E-statistics / Package energy in R (programming language).

Properties

(i) 0 ≤ dcor_n(X,Y) ≤ 1 and 0 ≤ dcor(X,Y) ≤1.

(ii) dcor(X,Y) = 0 if and only if X and Y are independent.

(iii) dcor_n(X,Y) = 1 implies that dimensions of the linear spaces spanned by X and Y samples respectively are almost surely equal and if we assume that these subspaces are equal, then here Y = a + b CX for some vector a, scalar b, and orthonormal matrix C.

References

Pearson, K. (1895). Royal Society Proceedings, 58, 241.
Pearson, K. (1920). Notes on the history of correlation, Biometrika, 13, 25-45.
Székely, G. J. Bakirov, N. K., and Rizzo, L. M. (2007). Measuring and testing independence by correlation of distances, The Annals of Statistics, 35, 2769-2794.
Székely, G.J. and Rizzo, M.L. (2009). Brownian distance covariance, The Annals of Applied Statistics, 3/4, 1233-1308.