User:C-processor/Polar Tree
An editor has performed a search and found that sufficient sources exist to establish the subject's notability. |
Polar Tree is set of binary codes assigned to a symbol set in the process known in information theory as entropy encoding. The goal is to reduce the length taken by encoded fragment. Other binary trees created for the same purpose are Shannon-Fano coding and Huffman coding. All three methods of constructing codes are known as prefix codes because non of these codes is the prefix of another code, which allows unmistakably to restore sequence of symbols from encoded binary stream. Historically Shannon-Fano coding was introduced earlier than Huffman coding and as, it was proven later, Huffman coding provides shortest possible encoded message for a given set of frequencies of occurrence. On that reason Huffman coding replaced Shannon-Fano coding for a very long time until adaptive entropy encoding technique was introduced and replaced completely static encoding. In adaptive encoding the tree is not output into encoded stream but updated synchronously during both encoding and decoding. The tree is created based on already seen symbols and applied to those new symbols that appear in the message. On that reason the difference in codes is not as important as precision in estimation frequencies for symbols that did not yet occur. For example, presume in adaptive encoding we start from default tree and update it after every 1000 symbols. In both encoding and decoding we may use statistics collected on first 1000 symbols and recreate the tree. New tree will be applied for encoding of next 1000 symbols, that are not yet occurred, and usage of Shannon-Fano coding or Huffman coding makes fraction of percent in the length of encoded fragment, while adaptation interval, strategy in updating frequencies or other mechanism of tree adjustment may provide significantly higher contribution into compression ratio. On that reason, in adaptive algorithms, the most important role plays the speed and convenience in recreation the tree rather than its optimality, because optimality exists only for provided frequencies while actual frequencies in adaptive encoding are always different from presumed.
Symbol | Frequency | Huffman codes | Shannon-Fano codes | Polar codes |
---|---|---|---|---|
A | 15 | 0 | 00 | 0 |
B | 7 | 100 | 01 | 100 |
C | 6 | 101 | 10 | 101 |
D | 6 | 110 | 110 | 110 |
E | 5 | 111 | 111 | 111 |