Binary entropy function

In information theory, the binary entropy function, denoted $H(p)\,$ or $H_{\mathrm {b} }(p)\,$ , is defined as the entropy of a Bernoulli process with probability of success p. Mathematically, the Bernoulli trial is modelled as a random variable X that can take on only two values: 0 and 1. The event $X=1$ is considered a success and the event $X=0$ is considered a failure. (These two events are mutually exclusive and exhaustive.)

If $\mathrm {Pr} (X=1)=p,$ then $\mathrm {Pr} (X=0)=1-p$ and the entropy of X is given by

H(X)=H_{\mathrm {b} }(p)=-p\log _{2}p-(1-p)\log _{2}(1-p).\,

where $0\log _{2}0$ is taken to be 0. The logarithms in this formula are usually taken (as shown in the graph) to the base 2. See binary logarithm.

When $p={\frac {1}{2}},$ the binary entropy function attains its maximum value. This is the case of the unbiased bit, the most common unit of information entropy.

$H(p)$ is distinguished from the entropy function by its taking a single scalar constant parameter. For tutorial purposes, in which the reader may not distinguish the appropriate function by its argument, $H_{2}(p)$ is often used; however, this could confuse this function with the analogous function related to Rényi entropy, so $H_{\mbox{b}}(p)$ (with "b" not in italics) should be used to dispel ambiguity.

Explanation

In terms of information theory, entropy is considered to be a measure of the uncertainty in a message. To put it intuitively, suppose $p=0$ . At this probability, the event is certain never to occur, and so there is no uncertainty at all, leading to an entropy of 0. If $p=1$ , the result is again certain, so the entropy is 0 here as well. When $p=1/2$ , the uncertainty is at a maximum; if one were to place a fair bet on the outcome in this case, there is no advantage to be gained with prior knowledge of the probabilities. In this case, the entropy is maximum at a value of 1 bit. Intermediate values fall between these cases; for instance, if $p=1/4$ , there is still a measure of uncertainty on the outcome, but one can still predict the outcome correctly more often than not, so the uncertainty measure, or entropy, is less than 1 full bit.