Talk:Canonical Huffman code
Appearance
This article probably needs the examples rewriting as Canonical Huffman coding starts with all-zeros for the shortest code and ends with all-ones for the longest code:
A 0 B 10 C 110 D 111
This is opposite to the examples as given. The method for construction is then:
code = 0 while more symbols: print symbol, code code = code + 1 if next bit length: code = code << 1
Sladen 13:54, 15 December 2006 (UTC)
- Done this and attempted to clean up. Please check over that it actually makes sense and is correct. Sladen 02:25, 20 December 2006 (UTC)
3 March 2007
- The reason the longest codes typically have all-zeros is that this version of canonical Huffman coding guarantees that all non-zero bits will appear in the final ceil(log2(alphabetsize)) bits of the pattern. This allows the pattern to be stored in a fixed-size variable since the leading 0s don't need to be stored.
Better examples
Two good example words might be headache and beachhead. Both of these words are at least 8 letters long, contain repeated letters and only use the first eight letters in the alphabet. cabbage and baggage are seven letters a piece.
headache
symbol weight Huffman bits code canonical a 2 00 2 0 00 b 0 0 c 1 110 3 6 110 d 1 111 3 7 111 e 2 01 2 1 01 f 0 0 g 0 0 h 2 10 2 2 10
8,- 4,- 4,- 2,a 2,e 2,h 2,- c,1 d,1 00 01 10 110 111
2,0,3,3,2,0,0,2
Could be combined with ace headache, abba had a bad headache.
symbol weight Huffman bits code canonical a 7 00 2 0 00 b 3 10 2 1 01 c 1 111 3 4 100 d 3 110 3 5 101 e 2 101 3 6 110 f 0 0 g 0 0 h 2 110 3 7 111
19,- 10,- 9,- 7,a 3,b 6,- 3,- 3,d 2,e 2,h 1,c 00 01 100 101 110 111
2,2,3,3,3,0,0,3
beachhead
symbol weight huffman? code bits a 2 00 0 2 b 1 110 6 3 c 1 1110 14 4 d 1 1111 15 4 e 2 01 1 2 f 0 - 0 g 0 - 0 h 2 10 2 2
Could be combined with faded beachhead cafe.