Jump to content

LZ4 (compression algorithm)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Staticd (talk | contribs) at 11:48, 19 July 2013 (Created page with ' '''LZ4''' is a lossless data compression algorithm that is focused on compression and decompression speed. It is belongs to the LZ77 family of ...'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

LZ4 is a lossless data compression algorithm that is focused on compression and decompression speed. It is belongs to the LZ77 family of compression schemes.


Features

The algorithm gives a slightly worse compression ratio than the LZO algorithm – which in turn is worse than algorithms like gzip. However, compression speeds are similar to LZO, and several times faster than other algorithms while decompression speeds can be upto twice that of LZO. In a worst case scenario, incompressible data gets increased by 0.4%.

Design

The LZ4 algorithm represents the data as a series of sequences. Each sequence begins with a one byte token that is broken into two 4 bit fields. The first field represents the number of literal bytes that are to be copied to the output. The second field represents the number of bytes to copy from the already decoded output buffer (with 0 representing the minimum match length of 4 bytes). A value of 15 in either of the bitfields indicates that the length is larger and there is an extra byte of data that is to be added to the length. A value of 255 in these extra bytes indicates that yet another byte to be added. Hence arbitrary lengths are represented by a series of extra bytes containing the value 255.The string of literals comes the token and any extra bytes needed indicate string length. This is followed by an offset that indicates how far back in the output buffer to begin copying. The extra bytes (if any) of the match-length come at the end of the sequence.

Compression can be carried out in a stream or in blocks. Higher compression ratios can be achieved by investing more effort in finding the best matches. This results in both a smaller file and a faster decompression.

Implementation

The reference implementation in C by Yann Collet is licensed under a BSD licence. There are ports and bindings in various languages like Java, C#, Python etc.LZ4 was also implemented natively in the Linux Kernel 3.11.

LZ4 repository

References