From Wikipedia, the free encyclopedia
In information theory , Shannon's noiseless coding theorem places an upper and a lower bound on the minimal possible expected length of codewords as a function of the entropy of the input word (which is viewed as a random variable ) and of the size of the target alphabet.
Shannon's statement
Let X be a random variable taking values in some finite alphabet
Σ
1
{\displaystyle \Sigma _{1}}
and let f be a decipherable code from
Σ
1
{\displaystyle \Sigma _{1}}
to
Σ
2
{\displaystyle \Sigma _{2}}
where
|
Σ
2
∗
|
=
a
{\displaystyle |\Sigma _{2}^{*}|=a}
. Let S denote the resulting wordlength of f(X) .
If f is optimal in the sense that it has the minimal expected wordlength for X , then
H
(
X
)
log
2
a
≤
E
S
<
H
(
X
)
log
2
a
+
1
{\displaystyle {\frac {H(X)}{\log _{2}a}}\leq \mathbb {E} S<{\frac {H(X)}{\log _{2}a}}+1}
Proof
Let
s
i
{\displaystyle s_{i}}
denote the wordlength of each possible
x
i
{\displaystyle x_{i}}
(
i
=
1
,
…
,
n
{\displaystyle i=1,\ldots ,n}
). Define
q
i
=
a
−
s
i
/
C
{\displaystyle q_{i}=a^{-s_{i}}/C}
, where C is chosen so that
∑
q
i
=
1
{\displaystyle \sum q_{i}=1}
.
Then
H
(
X
)
=
−
∑
i
=
1
n
p
i
log
2
p
i
≤
−
∑
i
=
1
n
p
i
log
2
q
i
=
−
∑
i
=
1
n
p
i
log
2
a
−
s
i
+
∑
i
=
1
n
log
2
C
≤
−
∑
i
=
1
n
−
s
i
p
i
log
2
a
≤
−
E
S
log
2
a
{\displaystyle {\begin{matrix}H(X)&=&-\sum _{i=1}^{n}p_{i}\log _{2}p_{i}\\&&\\&\leq &-\sum _{i=1}^{n}p_{i}\log _{2}q_{i}\\&&\\&=&-\sum _{i=1}^{n}p_{i}\log _{2}a^{-s_{i}}+\sum _{i=1}^{n}\log _{2}C\\&&\\&\leq &-\sum _{i=1}^{n}-s_{i}p_{i}\log _{2}a\\&&\\&\leq &-\mathbb {E} S\log _{2}a\\\end{matrix}}}
where the second line follows from Gibbs' inequality and the third line follows from Kraft's inequality :
C
=
∑
i
=
1
n
a
−
s
i
≤
1
{\displaystyle C=\sum _{i=1}^{n}a^{-s_{i}}\leq 1}
so
log
C
≤
0
{\displaystyle \log C\leq 0}
.
For the second inequality we may set
s
i
=
⌈
−
log
a
p
i
⌉
{\displaystyle s_{i}=\lceil -\log _{a}p_{i}\rceil }
so that
−
log
a
p
i
≤
s
i
<
−
log
a
p
i
+
1
{\displaystyle -\log _{a}p_{i}\leq s_{i}<-\log _{a}p_{i}+1}
and so
a
−
s
i
≤
p
i
{\displaystyle a^{-s_{i}}\leq p_{i}}
and
∑
a
−
s
i
≤
∑
p
i
=
1
{\displaystyle \sum a^{-s_{i}}\leq \sum p_{i}=1}
and so by Kraft's inequality there exists a prefix-free code having those wordlengths. Thus the minimal S satisfies
E
S
=
∑
p
i
s
i
<
∑
p
i
(
−
log
a
p
i
+
1
)
=
∑
−
p
i
log
2
p
i
log
2
a
+
1
=
H
(
X
)
log
2
a
+
1
{\displaystyle {\begin{matrix}\mathbb {E} S&=&\sum p_{i}s_{i}\\&&\\&<&\sum p_{i}\left(-\log _{a}p_{i}+1\right)\\&&\\&=&\sum -p_{i}{\frac {\log _{2}p_{i}}{\log _{2}a}}+1\\&&\\&=&{\frac {H(X)}{\log _{2}a}}+1\\\end{matrix}}}
See Also