Information theory and measure theory
![]() | This article needs attention from an expert on the subject. Please add a reason or a talk parameter to this template to explain the issue with the article. |
If we associate the existence of sets and with arbitrary discrete random variables X and Y, somehow representing the information borne by X and Y, respectively, such that:
- whenever X and Y are independent, and
- whenever X and Y are such that either one is completely determined by the other (i.e. by a bijection);
where is a measure over these sets, and we set:
we find that Shannon's "measure" of information content satisfies all the postulates and basic properties of a formal measure over sets. This can be a handy mnemonic device in some situations. Certain extensions to the definitions of Shannon's basic measures of information are necessary to deal with the σ-algebra generated by the sets that would be associated to three or more arbitrary random variables. (See Reza pp. 106-108 for an informal but rather complete discussion.) Namely needs to be defined in the obvious way as the entropy of a joint distribution, and an extended transinformation defined in a suitable manner (left as an exercise for the ambitious reader) so that we can set:
in order to define the (signed) measure over the whole σ-algebra. (It is interesting to note that the mutual information of three or more random variables can be negative as well as positive: Let X and Y be two independent fair coin flips, and let Z be their exclusive or. Then bit.)
This formulation reiterates and clarifies the fundamental properties of these basic concepts of information theory.