Data differencing
In computer science and information theory, data differencing or differential compression is producing a difference between two sets of data – a source and a target. Formally, a data differencing algorithm takes as input source data and target data, and produces difference data ("differencing"), such that given the source data and the difference data, one can reconstruct the target data ("patching" the source with the difference to produce the target).
Examples
One of the best-known examples of data differencing is the diff utility, which produces line-by-line differences of text files (and in some implementations, binary files). Differencing of general binary files goes under the rubric of delta encoding, with a widely-used example being the algorithm used in rsync. A standardized generic differencing format is VCDIFF, implemented in such utilities as Xdelta version 3. A high-efficiency (small patch files) differencing program is bsdiff, which is based on bzip2 compression, demonstrating the close connection between differencing and compression.
Connection with data compression
Data compression can be seen as a special case of data differencing[1][2] – data differencing consists of producing a difference given a source and a target, with patching producing a target given a source and a difference, while data compression consists of producing a compressed file given a target, and decompression consists of producing a target given only a compressed file. Thus, one can consider data compression as data differencing with empty source data, the compressed file corresponding to a "difference from nothing". This is the same as considering absolute entropy (corresponding to data compression) as a special case of relative entropy (corresponding to data differencing) with no initial data.
When one wishes to emphasize the connection, one may use the term differential compression to refer to data differencing.
A dictionary translating between the terminology of the two fields is given as:
compression | differencing |
---|---|
none | source |
uncompressed | target |
compressed | difference, delta |
compression | differencing |
decompression | patching |
See also
References
- ^ RFC 3284
- ^ Korn, D.G.; Vo, K.P. (1995), B. Krishnamurthy (ed.), Vdelta: Differencing and Compression, Practical Reusable Unix Software, John Wiley & Sons