Universal code (data compression)

In data compression, a universal code maps the integers (representing messages) onto self-delimiting binary codewords. Longer binary codewords are mapped to less probable messages. (Self-delimiting codes are also called prefix-free codes or "instantaneously decodable" codes).

Unlike other compression codes such as Huffman coding or fixed-length codes, universal codes do not require the receiver or the transmitter to know the maximum integer (number of potential messages) ahead of time.

Universal codes include:

Huffman coding, range encoding, and arithmetic encoding (when they can be used) give at least as good, and often better compression than any universal code.

However, universal codes are useful when Huffman coding cannot be used -- for example, when one does not know the exact probability of each message, but only knows their ranking (this message is more probable than that message).

Universal codes are also useful when Huffman codes are inconvenient. For example, when the transmitter but not the receiver knows the probabilities of the messages, Huffman coding requires an overhead of transmitting those probabilities to the receiver. Using a universal code does not have that overhead.

Each universal code, like each other self-delimiting (prefix-free) binary code, has its own "implied probability distribution". If a set of messages happens to have a probability distribution similar enough to that implied by some universal code, then the Huffman code for that set of messages will be equivalent to that universal code. Since universal codes are simpler and faster to encode and decode than Huffman codes (which is, in turn, simpler and faster than range encoding and arithmetic encoding), the universal code would be preferable in cases where the codes are equivalent. [1]

Some codes such as Elias delta coding and Huffman are "asymptotically optimal", while others such as Elias gamma coding are not. [2]

The figures below compare Elias Gamma, Elias Delta, Fibonacci, and various Rice codings for bits used to encode integer values. A baseline for binary encoding where the size is already known is also included. From the figures, you can see that the Fibonacci and Elias Delta codings have good overall performance. If you know the range of the values you are likely to encode and know you will rarely fall outside that range, you can find a Rice coding that has a sweet spot that is hard to beat.