Jump to content

Data differencing

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Nbarth (talk | contribs) at 06:20, 26 November 2009 (examples). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science and information theory, data differencing is producing a difference between two sets of data – a source and a target. Formally, a data differencing algorithm takes as input source data and target data, and produces difference data ("differencing"), such that given the source data and the difference data, one can reconstruct the target data ("patching" the source with the difference to produce the target).

Examples

One of the best-known examples of data differencing is the diff utility, which produces line-by-line differences of text files (and in some implementations, binary files). Differencing of general binary files goes under the rubric of delta encoding, with a widely-used example being the algorithm used in rsync. A standardized generic differencing format is VCDIFF, implemented in such utilities as Xdelta.

Connection with data compression

Data compression can be seen as a special case of data differencing[1][2] – data differencing consists of producing a difference given a source and a target, with patching producing a target given a source and a difference, while data compression consists of producing a compressed file given a target, and decompression consists of producing a target given only a compressed file. Thus, one can consider data compression as data differencing with empty source data, the compressed file corresponding to a "difference from nothing". This is the same as considering absolute entropy (corresponding to data compression) as a special case of relative entropy (corresponding to data differencing) with no initial data.

See also

References

  1. ^ RFC 3284
  2. ^ Korn, D.G.; Vo, K.P. (1995), B. Krishnamurthy (ed.), Vdelta: Differencing and Compression, Practical Reusable Unix Software, John Wiley & Sons