Data compression/Huffman coding

Huffman coding is an algorithm used for Data compression.

The basic idea is borrowed from an older and slightly inferior method called

The text to be compressed is considered as a string of symbols. Symbols that are likely to be frequent are represented by a short sequence of bits, and symbols that are likely to be rare are represented by a longer sequence of bits.

Huffman coding uses a specific method for choosing the representations for each symbol. It has been proven that Huffman coding is the most effective compression method of this type-- that is, no other choice of symbol coding will produce a smaller output when the actual symbol frequencies agree with those used to create the code.

There are variations. The frequencies used can be generic ones for the application domain that are based on average experience, or they can be the actual frequencies found in the text being compressed. (This variation requires that a frequency table or other hint as to the encoding must be stored with the compressed text.) Or they can be calculated dynamically based on recent actual frequencies in the text. This is called Adaptive Huffman Coding.

Huffman coding today is often used as a "back-end" to some other compression method. This is seen in Data compression/deflation

and in some options of JPEG for instance.

Huffman coding is optimal when the frequencies of input characters are powers of two.