Blahut–Arimoto algorithm

The term Blahut–Arimoto algorithm is often used to refer to a class of algorithms for computing numerically either the information theoretic capacity of a channel, or the rate-distortion function of a source. They are iterative algorithms that eventually converge to one of the maxima of the optimization problem that is associated with these information theoretic concepts. The problem is non-convex, thus requiring a sufficient initialization to not get stuck in local maxima.

Overview

The algorithm in general shows strong similarities to the training of (Restricted/Bipartite) Boltzmann Machines (a.k.a. "Folded" Deep Belief Networks) in machine learning (introduced by Geoffrey Hinton). Regarding the equations, it is strongly related to Gibbs Sampling with contrastive divergence (again a concept borrowed from physics and bayesian statistics used to train exponential families). The connection to machine learning is the fact, that the "code" (or efficient representation of the information) includes only relevant information about either a probability density function (pdf) (for capacity) or the data (for machine learning), which is drawn from an unknown pdf, which is to be learned by tuning the model parameters to minimize the expected distortion.

During the iterations, one has to compute a partition function (for marginalization) which is usually hard to compute and therefore apporximated by drawing samples from the posterior(s). The whole algorithm can be imagined as a cooling process (by continuously passing information between two layers), which finally freezes at the optimal conditional (equilibrium) pdf. When describing it with an "energy inspired" formalism, it is trained to reach its ground state. A physics-based interpretation may be that the flow of information passed during training corresponds to an exchange of virtual particles and so describing a force between layers finally bending both conditional pdfs to reproduce the input as good as possible. In other words, the most relevant information of the input gets destilled and therefore is linked to the information bottlenek principle.

History and application

For the case of channel capacity, the algorithm was independently invented by Suguru Arimoto^[1] and Richard Blahut.^[2] In the case of lossy compression, the corresponding algorithm was invented by Richard Blahut. The algorithm is most applicable to the case of arbitrary finite alphabet sources. Much work has been done to extend it to more general problem instances.^[3]^[4] Recently, a version of the algorithm that accounts for continuous and multivariate outputs was proposed with applications in cellular signaling.^[5] There exists also a version of Blahut–Arimoto algorithm for directed information.^[6]

Algorithm

Suppose we have a source $X$ with probability $p(x)$ of any given symbol. We wish to find an encoding $p({\hat {x}}|x)$ that generates a compressed signal ${\hat {X}}$ from the original signal while minimizing the expected distortion $\langle d(x,{\hat {x}})\rangle$ , where the expectation is taken over the joint probability of $X$ and ${\hat {X}}$ . We can find an encoding that minimizes the rate-distortion functional locally by repeating the following iteration until convergence:

p_{t+1}({\hat {x}})=\sum _{x}p(x)p_{t}({\hat {x}}|x)

p_{t+1}({\hat {x}}|x)={\frac {p_{t}({\hat {x}})\exp(-\beta d(x,{\hat {x}}))}{\sum _{\hat {x}}p_{t}({\hat {x}})\exp(-\beta d(x,{\hat {x}}))}}

where $\beta$ is a parameter related to the slope in the rate-distortion curve that we are targeting and thus is related to how much we favor compression versus distortion (higher $\beta$ means less compression).

References

^ Arimoto, Suguru (1972), "An algorithm for computing the capacity of arbitrary discrete memoryless channels", IEEE Transactions on Information Theory, 18 (1): 14–20, doi:10.1109/TIT.1972.1054753, S2CID 8408706.
^ Blahut, Richard (1972), "Computation of channel capacity and rate-distortion functions", IEEE Transactions on Information Theory, 18 (4): 460–473, CiteSeerX 10.1.1.133.7174, doi:10.1109/TIT.1972.1054855.
^ Pascal O. Vontobel (2002). A Generalized Blahut–Arimoto Algorithm. CiteSeerX 10.1.1.1.2567.
^ Iddo Naiss; Haim Permuter (2010). "Extension of the Blahut-Arimoto algorithm for maximizing directed information". arXiv:1012.5071v2 [cs.IT].
^ Tomasz Jetka; Karol Nienaltowski; Tomasz Winarski; Slawomir Blonski; Michal Komorowski (2019), "Information-theoretic analysis of multivariate single-cell signaling responses", PLOS Computational Biology, 15 (7): e1007132, arXiv:1808.05581, Bibcode:2019PLSCB..15E7132J, doi:10.1371/journal.pcbi.1007132, PMC 6655862, PMID 31299056{{citation}}: CS1 maint: article number as page number (link) CS1 maint: unflagged free DOI (link)
^ Naiss, Iddo; Permuter, Haim H. (January 2013). "Extension of the Blahut–Arimoto Algorithm for Maximizing Directed Information". IEEE Transactions on Information Theory. 59 (1): 204–222. doi:10.1109/TIT.2012.2214202.

[1] Arimoto, Suguru (1972), "An algorithm for computing the capacity of arbitrary discrete memoryless channels", IEEE Transactions on Information Theory, 18 (1): 14–20, doi:10.1109/TIT.1972.1054753, S2CID 8408706.

[2] Blahut, Richard (1972), "Computation of channel capacity and rate-distortion functions", IEEE Transactions on Information Theory, 18 (4): 460–473, CiteSeerX 10.1.1.133.7174, doi:10.1109/TIT.1972.1054855.

[3] Pascal O. Vontobel (2002). A Generalized Blahut–Arimoto Algorithm. CiteSeerX 10.1.1.1.2567.

[4] Iddo Naiss; Haim Permuter (2010). "Extension of the Blahut-Arimoto algorithm for maximizing directed information". arXiv:1012.5071v2 [cs.IT].

[5] Tomasz Jetka; Karol Nienaltowski; Tomasz Winarski; Slawomir Blonski; Michal Komorowski (2019), "Information-theoretic analysis of multivariate single-cell signaling responses", PLOS Computational Biology, 15 (7): e1007132, arXiv:1808.05581, Bibcode:2019PLSCB..15E7132J, doi:10.1371/journal.pcbi.1007132, PMC 6655862, PMID 31299056{{citation}}: CS1 maint: article number as page number (link) CS1 maint: unflagged free DOI (link)

[6] Naiss, Iddo; Permuter, Haim H. (January 2013). "Extension of the Blahut–Arimoto Algorithm for Maximizing Directed Information". IEEE Transactions on Information Theory. 59 (1): 204–222. doi:10.1109/TIT.2012.2214202.

[1]

[2]

[3]

[4]

[5]

[6]