Tunstall coding

In computer science and information theory, Tunstall coding is a form of entropy coding used for lossless data compression.

History

Tunstall coding was the subject of Brian Parker Tunstall's PhD thesis in 1967, while at Georgia Institute of Technology. The subject of that thesis was "Synthesis of noiseless compression codes" ^[1].

Its design is a precursor to Lempel-Ziv.

Relation to similar codes

Unlike variable-length codes, which include Huffman and Lempel–Ziv coding, Tunstall coding is a code which maps source symbols to a fixed number of bits.

Unlike typical set encoding, Tunstall coding parses a stochastic source with codewords of variable length.

Algorithm

Example

Tunstall coding — one iteration.

Tunstall coding — two iterations.

Limitations

Tunstall coding requires the algorithm to know, prior to the parsing operation, what the distribution of probabilities for each letter of the alphabet is. This issue is shared with Huffman coding.

Its requiring a fixed-length word output makes it lesser than Lempel-Ziv, which has a similar dictionary-based design, but with a variable-sized word output.

References

^ [1]

[1] [1]

[1]