Talk:Range coding

I've rewritten what was here to be grammatically correct English. Later I hope to go back and edit, to put in detail on actual range encoding, and remove the claim added by Suns which seems (as far as I can tell) to be based solely on one particular implementation of range encoding, not on range encoding itself. (Why is arithmetic coding on the page arithmetic encoding, Huffman coding on the page Huffman encoding, but range encoding on the page Range encoder?)

Range encoding and arithmetic encoding

What exactly is the difference between range and arithmetic encoding? I can't seem to see any difference in principle. This sentence from the article

Arithmatic coding can be thought of as a form of range encoding with the range starting at zero and extending to one.

seems to confirm my suspicion that they are slightly different ways of describing the same thing. CyborgTosser (Only half the battle) 12:10, 23 Jan 2005 (UTC)

That's the tricky thing. They are very nearly the same thing. However, the differences that there are, and the fact that range encoding has a practice that much more closely approximates its theory than arithmetic coding does, may make an important difference legally, in determining what patents do and don't cover. Range encoding, in both theory and practice, involves ranges of integers. Arithmetic coding in theory works with rational numbers between 0 and 1, but since practical considerations cause the precision to be limited to typically 32 bits, it's as if they were using 32-bit ranges of integers. There may be differences that I don't know about -- I don't pretend to be an expert -- but as far as I know, the differences that there are are mostly legal in nature. -- Antaeus Feldspar 19:01, 23 Jan 2005 (UTC)

Yeah exactly! When implemented in computers, range and arithmetic encoders produce the same sequence of bits as far as I can see.

-- (Someone anonymous, who left no signature.)

As far as I can tell (having looked at both arithmetic coding and range coding from time to time), they're the same thing, just interpreted in two, different but related ways. If you've got an arithmetic coder (encoder/decoder), you've got the corresponding range coder - it's the same thing. And vice-versa, too.

With arithmetic coding, the actual arithmetic code, which is a sequence of (coding) symbols (not the sequence of coded symbols), is taken as representing a non-negative rational number in the range [0,1). The representation does not include the initial "0.", because that's always going to be there, so is left as implicit. That rational number is itself taken as representing the range (of real numbers) in which it lies, which is taken as representing the message that's encoded as the arithmetic code. The (coding) symbols in the arithmetic code are basically taken as the digits following the "0." part of the rational number, with enough digits to specify the (single message) range it falls in. As I understand it, the range specified contains (at least) all the reals which would have, as their most significant digits after the point (the "0." part), the arithmetic code itself.

With range coding, the actual range code is taken as representing a subset of natural numbers from a set of natural numbers. The range code itself is interpreted as being a sequence of digits, those digits being the most significant digits of all the natural numbers in the subset the range code represents, but with zero or more leading zeroes added. In other words: it's a prefix common to all the naturals in the subset, padded with leading zeroes in the most significant digit places. (Without the seemingly redundant leading zeroes, the prefix "154" could be taken as representing the subset {154 000 to 154 999} as well as the subset {154 000 000 to 154 999 999} - which subset is it supposed to represent? Prefixes "00000154" and "00154" don't have this problem.) In turn, the subset itself represents a subset (of which it is a subset), which in turn represents a message. As I understand it, one of those subsets (either the subset of naturals which start with the range code as a common prefix, or the message-representing subset that includes the first subset as a subset) is the so-called range identified by the range code.

But given a range code, we can just as easily interpret it as the corresponding arithmetic code, and vice-versa. To reinterpret a range code as an arithmetic code, all we've got to do is interpret it as the part after the implicit "0." part of (a representation of) a rational number. To reinterpret an arithmetic code as a range code, we just interpret it as a common prefix in the range coding way. Or, to put it another way, if you've got a range code, you can find the corresponding arithmetic code by doing the following:-

Take the range code as representing the smallest natural number in the range it represents. (This is easy. You just take the digits of the range code itself, and append as many zeroes as necessary. The resulting, appended range code will now represent a 'range' of just one, single number. That's the number we want.)
Divide that number by the size (the cardinality) of the full set of natural numbers for all possible messages (in that range coding scheme, anyway). Or, to put it another way:-
1. call the number base being used for this range coding b;
2. call the number of digits in the appended range code, including leading zeroes, N;
3. raise b to the power of N, calling the result d; and
4. divide the number represented by that appended range code by d.

The result is a number in the range [0,1), a suitable representation of which would be the corresponding arithmetic code. That suitable representation would be the sequence of digits following the initial "0.", with as many digits as are necessary to identify the range representing the message. If the original range code uses only as many digits as necessary, and if we use only as many digits as necessary in the corresponding arithmetic code, the two codes will be identical (and so we've gone round the houses, taking the scenic route, when we could've just taken a (very) short cut).

We can do the same the other way round, to get from arithmetic codes to corresponding range codes, too.

The big upshot of this is that an arithmetic encoder/decoder is also the corresponding range encoder/decoder - it's up to you which you happen to choose to interpret it as being. After all, a range encoder is something that turns messages into corresponding range codes, and an arithmetic coder is something that turns messages into corresponding arithmetic codes, and corresponding range codes and arithmetic codes are identical. So, if it's a range coder, it's an arithmetic coder, and vice-versa.

This may have significant implications for the rumours that exist regarding arithmetic coding patents and range coding, but I am not a lawyer.

As for what range coding actually is - arithmetic coding - unless I've badly misunderstood it, it really is just another way of interpreting exactly the same computational, algorithmic, symbol-manipulating, data-processing thing.

--Simon G Best 17:26, 17 July 2006 (UTC)[reply]

I've now edited the article to bring it into line with the arithmetic coding article (which I've also edited). This is to correct and clarify the relationship between arithmetic coding and range coding, since they're really just two, slightly different ways of understanding the same thing.

--Simon G Best 15:54, 21 July 2006 (UTC)[reply]