Talk:Entropy coding

Is "entropy coding" the same thing as "lossless compression"?

Remark: This article contains some problems that appear not worth correcting because the article seems approximately fully-redundant with the article on lossless compression. Also "entropy coding" would be better a better subject title than "entropy encoding". -- comments by 4.65.146.242 moved from article

Entropy encoding is only a subset of lossless compression. LZ77 compression, for instance, is an important compression technique that isn't any form of entropy encoding. -- Antaeus Feldspar 20:43, 4 Jun 2005 (UTC)

Actually, I don't know where Feldspar gets his definitions, but I mostly agree with the prior comment by 4.65.146.242. In the usage I have experienced, the terms "entropy coding" and "lossless compression" are synonymous, and both terms apply to such things as Lempel-Ziv coding. I have never previously heard anyone assert that LZ coding or other such things are not entropy coding. The content of this page also seems rather overly simplistic. For example, I don't think what it describes is a very accurate description of what happens in arithmetic coding. I also agree that "entropy coding" is better than "entropy encoding". Pawnbroker 05:28, 5 Jun 2005 (UTC)

Then I don't know where you get your definitions, Pawn. People might use "entropy coding" as if it was a synonym of "lossless compression" (just as there are some major RFCs that incorrectly use "Huffman code" as if it was a synonym of "prefix code", instead of denoting only to those prefix codes created by a Huffman algorithm) but that's definitely not correct usage. Entropy encoding is encoding where each symbol is assigned a pattern whose length/cost corresponds to its entropy (hence the name). While entropy encoding is quite often used with LZ77 compression, as the two techniques complement each other, LZ77 is not an example of entropy encoding. -- Antaeus Feldspar 17:03, 5 Jun 2005 (UTC)

Can someone point to any textbook on information theory or any similar such authoritative source that draws a distinction between the two terms in this manner? Pawnbroker 05:46, 12 Jun 2005 (UTC)

Well, the JPEG standard defines "entropy encoding" as "A lossless procedure which converts a sequence of input symbols into a sequence of bits such that the average number of bits per symbol approaches the entropy of the input symbols", a description which does not apply to dictionary coding. However, if you weren't prepared before to believe that entropy coding might have something to do with entropy, I have no idea what sort of source you're prepared to accept as "authoritative". -- Antaeus Feldspar 16:08, 12 Jun 2005 (UTC)

Two responses: 1) Actually, Lempel-Ziv coding does fit into that definition pretty well, 2) the JPEG standard is a document that defines terms for its own purposes - it is not meant to be a textbook on information theory (it has a status perhaps roughly similar to those major RFCs that confuse Huffman coding with prefix coding - note for example that JPEG defines lossless coding in a way that includes only the kinds of lossless coding described in the JPEG document - clearly there are others). If you could point to a single textbook on information theory that draws such a distinction, that would suffice to satisfy me. Even a single paper in the IEEE Transactions on Information Theory would be something (or a digital communication textbook, or one or more papers in some other IEEE Transactions, like the Transactions on Signal Processing or the Transactions on Communications). Right now the assertion that there's a difference between the two terms is completely unsupported by published sources. Pawnbroker 04:48, 14 Jun 2005 (UTC)

Further elaborating on point 1 above - the quoted JPEG definition of "entropy encoding" is actually not bad. (It is certainly better than the one found here at Wikipedia today.) Note, in particular, that it does not refer to a symbol-by-symbol mapping of each input symbol to a particular codeword length. It only says that the average number of bits per symbol approaches the entropy of the input symbols. There are many ways to make the average approach the entropy (or at least be substantially less than the average number of bits used to represent the input to the entropy encoding process). Those ways include techniques like LZW coding and other dictionary-based methods. Many of those ways do not involve a mapping of individual input symbols to specific coded lengths. In fact, when the input symbols are not probabilistically independent of each other (e.g., when there is some statistical dependency between input symbol values, as there is with text, for example) any simple one-to-one mapping like a Huffman code will be inferior to methods that take into account the inter-symbol dependencies. The JPEG document is therefore evidence against the Feldsparian interpretation, not for it. Pawnbroker 23:29, 14 Jun 2005 (UTC)

Note also that entropy of a symbol is not the same thing as the negative log of the probability of the symbol value. Entropy includes the concept of averaging over all possible values of the input symbol. Entropy is the expected value of the negative log of the probability, not the actual value of the negative log of the probability associated with a particular symbol's value. The negative log of the probability of the sample value (assuming independent source symbol statistics) is the "information" conveyed by that sample value. The average amount of information per symbol is the entropy. Thus the above Feldspar quote that "each symbol is assigned a pattern whose length/cost corresponds to its entropy" is not correct in that sense. In a Huffman-style scheme, each symbol is assigned a pattern whose length corresponds (approximately) to the amount of information conveyed by its value (not the entropy conveyed by its value). Entropy coding therefore is conceptually focused on good average behavior on typical data, not at what happens from the micro perspective of individual symbols and individual short message lengths. Pawnbroker 23:52, 14 Jun 2005 (UTC)

Repetition of content

Much of this content seems to be saying that the length of the codes are assigned proportionally to the inverse logarithm of the probability. Couldn't this just be said once?