User:C-processor/Polar Tree
An editor has performed a search and found that sufficient sources exist to establish the subject's notability. |
Polar Tree is set of binary codes assigned to a symbol set in the process known in information theory as entropy encoding. The goal is to reduce the length taken by encoded fragment. Other binary trees created for the same purpose are Shannon-Fano coding and Huffman coding. All three methods of constructing codes are known as prefix coding because non of these codes is the prefix of another code, which allows unmistakably to restore sequence of symbols from encoded binary stream. Historically Shannon-Fano coding was introduced earlier than Huffman coding and as, it was proven, Huffman coding provides shortest possible encoded message for a given set of frequencies of occurrence. On that reason Huffman coding replaced Shannon-Fano coding for a very long time until adaptive entropy encoding technique was introduced and replaced completely static encoding. In adaptive encoding the tree is not output into encoded stream but updated synchronously during both encoding and decoding. The tree is created based on already seen symbols and applied to those new symbols that appear in the message. On that reason the difference in codes is not as important as precision in estimation frequencies for symbols that did not yet occur. For example, presume in adaptive encoding we start from default tree and update it after every 1000 symbols. In both encoding and decoding we may use statistics collected on first 1000 symbols and recreate the tree. New tree will be applied for encoding of next 1000 symbols, that are not yet occurred, and usage of Shannon-Fano coding or Huffman coding makes fraction of percent in the length of encoded fragment, while adaptation interval, strategy in updating frequencies or other mechanism of tree adjustment may provide significantly higher contribution into compression ratio. On that reason, in adaptive algorithms, the most important role plays the speed and convenience in recreation of the tree rather than its optimality, because optimality exists only for provided frequencies while actual frequencies in adaptive encoding are always different from presumed. Polar tree, in general case, is different from Huffman tree although may coincide in many cases, but it can be created or updated by very small number of operation, what makes it attractive in usage in different adaptive algorithms, where trees have to be updated frequently. In the table below the codes for all three trees are presented for the same example as considered in Shannon-Fano coding.
Symbol | Frequency | Huffman codes | Shannon-Fano codes | Polar codes |
---|---|---|---|---|
A | 15 | 0 | 00 | 0 |
B | 7 | 100 | 01 | 100 |
C | 6 | 101 | 10 | 101 |
D | 6 | 110 | 110 | 110 |
E | 5 | 111 | 111 | 111 |
In this particular case Polar codes are identical to Huffman codes, but in general case there may be some difference. Same is true for Shannon-Fano coding, they might be identical to Huffman as well. The algorithm of constructing of Polar codes goes over code lengths generation as it is shown in the next table.
Symbol | Original frequency | First run | Second run | Third run | Code length | Code |
---|---|---|---|---|---|---|
A | 15 | 8 | 16 | 32 | 1 | 0 |
B | 7 | 4 | 8 | 8 | 3 | 100 |
C | 6 | 4 | 8 | 8 | 3 | 101 |
D | 6 | 4 | 8 | 8 | 3 | 110 |
E | 5 | 4 | 8 | 8 | 3 | 111 |