Talk:Magic compression algorithm

A fact from Magic compression algorithm appeared on Wikipedia's Main Page in the Did you know column on 2 September 2007. A record of the entry may be seen at Wikipedia:Recent additions/2007/September.

Wikipedia

2^N-2?

While I was thinking about how to explain the problem nicely, a thought occurred to me. If the compressed sequence has 2^N-1 possible combinations, then we have a problem with 0 length files. As an example, consider a plain sequence of N=4 bits - there are 16 such sequences. Then we can use compressed sequences of 3 bits, of which there are 2^3=8; or those of 2 bits, of which there are 2^2=4; or those of 1 bit, of which there are 2^1=2. However, we cannot use those of 0 bits, because there's no file left. Therefore there is no 2^0 term, and there are a total of 2^3+2^2+2^1=14 different compressed sequences, ie. 2^N-2. See what I mean? Modest Genius ^talk 17:41, 28 August 2007 (UTC)[reply]

I do not think there's any reason to exclude the empty string from valid outputs. All modern common OSes have the ability to deal with empty files (though they customarily use byte granularity for determining file length).

I think I see your point, but consider it an issue of implementation, similar to usage of prefix codes to facilitate padding. If the output stream has additional restrictions on its structure, this would obviously reduce the number of available output streams, but that does not change anything qualitatively: the pigeonhole principle still applies and the proof still stands. It might be a good idea to state "at most" when discussing the (2^N-1) number, though, and it might also be clarifying to explicitly exclude the empty string from inputs, because its shortening is obviously impossible. (The counts still hold, though: there's 2⁰ = 1 bit string of length 0, and there are 2⁰-1 = 0 bit strings of length -1.) Digwuren 18:57, 28 August 2007 (UTC)[reply]

Sure, though it's an interesting concept. It would certainly be an impressive algorithm that managed to compress things down to empty files! I suppose I'm thinking too literally, including the file descriptor in it's length, and then worrying about stringing empty files together ;).

The special case of an empty string probably doesn't need mentioning - after all the algorithm isn't magic if it only works on one specific input, and the impossibility of a sequence of -1 bits is self-evident. Interesting article btw. Modest Genius ^talk 19:22, 28 August 2007 (UTC)[reply]

The key problem with the text in the article is it assumes that the input sequence is exactly N bits long but the output may be any length up to N bits. If you know your input sequence is a fixed length and you consider it acceptable to permit the zero length ouput and your granularity is bits then you can indeed compress any input. Plugwash 22:18, 17 September 2007 (UTC)[reply]

Not even then. You have

2^{N}

different inputs of length

N

, but only

2^{N}-1

different possible outputs of length strictly less than

N

. –Henning Makholm 01:27, 18 September 2007 (UTC)[reply]

Suggested merge

It seems to me that this article would make better sense as a section in the lossless data compression article. The merge-proposal templates seem to want the discussion to take place on the talk page of the proposed merge target, so please go there and leave your opinion. –Henning Makholm 20:40, 13 September 2007 (UTC)[reply]

Description

In reference to [1], I think the article was better with the maths bits in it, and that the description was easier for a layman to understand beforehand. Comments? Modest Genius ^talk 21:21, 18 September 2007 (UTC)[reply]

2N-2?

Suggested merge

Description

2^N-2?