Jump to content

Talk:Magic compression algorithm

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Digwuren (talk | contribs) at 18:57, 28 August 2007 (2<sup>N</sup>-2?). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

2N-2?

While I was thinking about how to explain the problem nicely, a thought occurred to me. If the compressed sequence has 2N-1 possible combinations, then we have a problem with 0 length files. As an example, consider a plain sequence of N=4 bits - there are 16 such sequences. Then we can use compressed sequences of 3 bits, of which there are 2^3=8; or those of 2 bits, of which there are 2^2=4; or those of 1 bit, of which there are 2^1=2. However, we cannot use those of 0 bits, because there's no file left. Therefore there is no 2^0 term, and there are a total of 2^3+2^2+2^1=14 different compressed sequences, ie. 2N-2. See what I mean? Modest Genius talk 17:41, 28 August 2007 (UTC)[reply]

I do not think there's any reason to exclude the empty string from valid outputs. All modern common OSes have the ability to deal with empty files (though they customarily use byte granularity for determining file length).
I think I see your point, but consider it an issue of implementation, similar to usage of prefix codes to facilitate padding. If the output stream has additional restrictions on its structure, this would obviously reduce the number of available output streams, but that does not change anything qualitatively: the pigeonhole principle still applies and the proof still stands. It might be a good idea to state "at most" when discussing the (2^N-1) number, though, and it might also be clarifying to explicitly exclude the empty string from inputs, because its shortening is obviously impossible. (The counts still hold, though: there's 2^0 = 1 bit string of length 0, and there are 2^0-1 = 0 bit strings of length -1.) Digwuren 18:57, 28 August 2007 (UTC)[reply]