Entropy estimation

Estimating the differential entropy of a system or process, given some observations, is useful in various science/engineering applications, such as Independent Component Analysis^[1], image analysis^[2], genetic analysis ^[3], speech recognition^[4], manifold learning^[5], and time delay estimation^[6]. The simplest and most common approach uses histogram-based estimation, but other approaches have been developed and used, each with their own benefits and drawbacks.^[7]

Histogram estimator

The histogram approach uses the idea that the differential entropy,

H(X)=-\int _{\mathbb {X} }f(x)\log f(x)\,dx

can be approximated by producing a histogram of the observations, and then finding the discrete entropy

{\begin{matrix}H(X)=-\displaystyle {\sum _{i=1}^{n}f(x_{i})\log f(x_{i})}\qquad \end{matrix}}

of that histogram. Histograms can be quick to calculate, and simple, so this approach has some attractions. However, the estimate produced is biased, and although corrections can be made to the estimate, they may not always be satisfactory. ^[8]

Estimates based on sample-spacings

If the data is one-dimensional, we can imagine taking all the observations and putting them in order of their value. The spacing between one value and the next then gives us a rough idea of (the reciprocal of) the probability density in that region: the closer together the values are, the higher the probability density. This is a very rough estimate with high variance, but can be improved, for example by thinking about the space between a given value and the one m away from it, where m is some fixed number^[7].

The probability density estimated in this way can then be used to calculate the entropy estimate, in a similar way to that given above for the histogram, but with some slight tweaks.

One of the main drawbacks with this approach is going beyond one dimension: the idea of lining the data points up in order falls apart in more than one dimension. However, using analogous methods, some multidimensional entropy estimators have been developed. ^[9]

Estimates based on nearest-neighbours

It is possible to find the nearest neighbour to each given point and to calculate the distance. Based on the distribution of these distances for all data points, it is possible to construct an estimator. ^[7]

References

^ Dinh-Tuan Pham (2004) Fast algorithms for mutual information based independent component analysis. In Signal Processing. Volume 52, Issue 10, 2690 - 2700, doi:10.1109/TSP.2004.834398
^ Chang, C.-I.; Du, Y.; Wang, J.; Guo, S.-M.; Thouin, P.D. (2006) Survey and comparative analysis of entropy and relative entropy thresholding techniques. In Vision, Image and Signal Processing, Volume 153, Issue 6, 837 - 850, doi:10.1049/ip-vis:20050032
^ Martins, D. C. et al (2008) Intrinsically Multivariate Predictive Genes. In Selected Topics in Signal Processing. Volume 2, Issue 3, 424 - 439, doi:10.1109/JSTSP.2008.923841
^ Gue Jun Jung; Yung-Hwan Oh (2008) Information Distance-Based Subvector Clustering for ASR Parameter Quantization. In Signal Processing Letters, Volume 15, 209 - 212, doi:10.1109/LSP.2007.913132
^ Costa, J.A.; Hero, A.O. (2004), Geodesic entropic graphs for dimension and entropy estimation in manifold learning. In Signal Processing, Volume 52, Issue 8, 2210 - 2221, doi:10.1109/TSP.2004.831130
^ Benesty, J.; Yiteng Huang; Jingdong Chen (2007) Time Delay Estimation via Minimum Entropy. In Signal Processing Letters, Volume 14, Issue 3, March 2007 157 - 160 doi:10.1109/LSP.2006.884038
^ ^a ^b ^c J. Beirlant, E. J. Dudewicz, L. Gyorfi, and E. C. van der Meulen (1997) Nonparametric entropy estimation: An overview]. In International Journal of Mathematical and Statistical Sciences, Volume 6, pp. 17– 39.
^ G. Miller (1955) Note on the bias of information estimates. In Information Theory in Psychology: Problems and Methods, pp. 95–100.
^ E. G. Learned-Miller (2003) A new class of entropy estimators for multi-dimensional densities, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’03), vol. 3, April 2003, pp. 297–300.

[1] Dinh-Tuan Pham (2004) Fast algorithms for mutual information based independent component analysis. In Signal Processing. Volume 52, Issue 10, 2690 - 2700, doi:10.1109/TSP.2004.834398

[2] Chang, C.-I.; Du, Y.; Wang, J.; Guo, S.-M.; Thouin, P.D. (2006) Survey and comparative analysis of entropy and relative entropy thresholding techniques. In Vision, Image and Signal Processing, Volume 153, Issue 6, 837 - 850, doi:10.1049/ip-vis:20050032

[3] Martins, D. C. et al (2008) Intrinsically Multivariate Predictive Genes. In Selected Topics in Signal Processing. Volume 2, Issue 3, 424 - 439, doi:10.1109/JSTSP.2008.923841

[4] Gue Jun Jung; Yung-Hwan Oh (2008) Information Distance-Based Subvector Clustering for ASR Parameter Quantization. In Signal Processing Letters, Volume 15, 209 - 212, doi:10.1109/LSP.2007.913132

[5] Costa, J.A.; Hero, A.O. (2004), Geodesic entropic graphs for dimension and entropy estimation in manifold learning. In Signal Processing, Volume 52, Issue 8, 2210 - 2221, doi:10.1109/TSP.2004.831130

[6] Benesty, J.; Yiteng Huang; Jingdong Chen (2007) Time Delay Estimation via Minimum Entropy. In Signal Processing Letters, Volume 14, Issue 3, March 2007 157 - 160 doi:10.1109/LSP.2006.884038

[beirlant-7] J. Beirlant, E. J. Dudewicz, L. Gyorfi, and E. C. van der Meulen (1997) Nonparametric entropy estimation: An overview]. In International Journal of Mathematical and Statistical Sciences, Volume 6, pp. 17– 39.

[miller55-8] G. Miller (1955) Note on the bias of information estimates. In Information Theory in Psychology: Problems and Methods, pp. 95–100.

[lm2003-9] E. G. Learned-Miller (2003) A new class of entropy estimators for multi-dimensional densities, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’03), vol. 3, April 2003, pp. 297–300.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]