Talk:Magic compression algorithm

A fact from Magic compression algorithm appeared on Wikipedia's Main Page in the Did you know column on 2 September 2007. A record of the entry may be seen at Wikipedia:Recent additions/2007/September.

Wikipedia

2^N-2?

While I was thinking about how to explain the problem nicely, a thought occurred to me. If the compressed sequence has 2^N-1 possible combinations, then we have a problem with 0 length files. As an example, consider a plain sequence of N=4 bits - there are 16 such sequences. Then we can use compressed sequences of 3 bits, of which there are 2^3=8; or those of 2 bits, of which there are 2^2=4; or those of 1 bit, of which there are 2^1=2. However, we cannot use those of 0 bits, because there's no file left. Therefore there is no 2^0 term, and there are a total of 2^3+2^2+2^1=14 different compressed sequences, ie. 2^N-2. See what I mean? Modest Genius ^talk 17:41, 28 August 2007 (UTC)[reply]

I do not think there's any reason to exclude the empty string from valid outputs. All modern common OSes have the ability to deal with empty files (though they customarily use byte granularity for determining file length).

I think I see your point, but consider it an issue of implementation, similar to usage of prefix codes to facilitate padding. If the output stream has additional restrictions on its structure, this would obviously reduce the number of available output streams, but that does not change anything qualitatively: the pigeonhole principle still applies and the proof still stands. It might be a good idea to state "at most" when discussing the (2^N-1) number, though, and it might also be clarifying to explicitly exclude the empty string from inputs, because its shortening is obviously impossible. (The counts still hold, though: there's 2⁰ = 1 bit string of length 0, and there are 2⁰-1 = 0 bit strings of length -1.) Digwuren 18:57, 28 August 2007 (UTC)[reply]

Sure, though it's an interesting concept. It would certainly be an impressive algorithm that managed to compress things down to empty files! I suppose I'm thinking too literally, including the file descriptor in it's length, and then worrying about stringing empty files together ;).

The special case of an empty string probably doesn't need mentioning - after all the algorithm isn't magic if it only works on one specific input, and the impossibility of a sequence of -1 bits is self-evident. Interesting article btw. Modest Genius ^talk 19:22, 28 August 2007 (UTC)[reply]

The key problem with the text in the article is it assumes that the input sequence is exactly N bits long but the output may be any length up to N bits. If you know your input sequence is a fixed length and you consider it acceptable to permit the zero length ouput and your granularity is bits then you can indeed compress any input. Plugwash 22:18, 17 September 2007 (UTC)[reply]

Not even then. You have

2^{N}

different inputs of length

N

, but only

2^{N}-1

different possible outputs of length strictly less than

N

. –Henning Makholm 01:27, 18 September 2007 (UTC)[reply]

Suggested merge

It seems to me that this article would make better sense as a section in the lossless data compression article. The merge-proposal templates seem to want the discussion to take place on the talk page of the proposed merge target, so please go there and leave your opinion. –Henning Makholm 20:40, 13 September 2007 (UTC)[reply]

Description

In reference to [1], I think the article was better with the maths bits in it, and that the description was easier for a layman to understand beforehand. Comments? Modest Genius ^talk 21:21, 18 September 2007 (UTC)[reply]

Time

I think it's quite amusing that while Wikipedia does not allow for unreferenced information, theories, etc. that it does allow for these types of pages. There are algorithms that actually work on maximally random sets, or at least one algorithm.

As always, 'magic' is just a word for extremely advanced technology, sometimes produced naturally through twists of fate. Much as Wikipedia is not for unreferenced theories, i think this 'magic compression algorithm' page is similarly that.. a theory with an faulty reference.

The only caveat is that there is a minimal informational density required for the algorithm to function. I've read the supposed 'proof' of the counterargument against any 'magic' method, but the fact remains.. binary is the worst way to lay out data in 2d that is wanting to be compressed. So doing a 'counting argument' against the string sets in a recursive/folding methodology simply isn't valid... many of the resultant strings are indeed doubled up since they are phasic to the wholey compressed signal. You have to use time to understand time.

I suggest you remove this (primitive/offensive) page. Cudos to the referencers of course for sourcing out the info for this page.. but it has to be valid.

UmbraPanda 07:47, 9 October 2007 (UTC)[reply]

No it is not just a theory it is a mathematically proven fact that any compression algorithm that makes some files smaller must make other files larger. If you truely belive you have an algorithm that violates that please post it somewhere suitable (e.g. one of the many compression newsgroups) for debunking. Plugwash 14:31, 9 October 2007 (UTC)[reply]

I very much understand that you're talking about an algorithm that reduces any size under the threshold isn't going to work. I propose that you're missing my point completely. There is very much a lower bounds to the size of the compressed file, anything after that relates to the potential space as a dictionary not in the sense of the actual content, but the ratio of index size to possible matches.

Like so;

1) say you have an algorithm that will compress 1 in 100 pieces of information 99%.

2) so, every 100 pieces you have a 1% chance of it working or not.

3) if you were to check 100 pieces at a time, probability is 1 that you'd find a match. great. but in terms of data density, this is impractical since you'd need to marker every other 99 pieces with a single piece of information that says it isn't able to be compressed. this pushes over the limit, making the algorithm non-remarkable.

4) now, check 100,000 pieces at a time. probability is 100% of the algorithm finding at least one match inside of the data, by checking each sequential 100 block.

5) great. but the index of that fragment needs to be stored, along with the algorithm's parameters on how to expand the fragment if necessary.

6) there are now two seperate pieces, a 99% compressed fragment along with displacement information, and the rest of the file, minus the uncompressed fragment.

7) quantizing this, there is a lower bound exactly where the bit cost of the index plus the compressed fragment plus the expansion parameters is lower than the relative size of the uncompressed fragment. quantizing the 100 units, those could be 100 units of base one-trillion information.

8) as long as the bit cost of the uncompressed fragment that's saved out of the original file, in bits, is larger than the bit cost of the displacement index plus the compressed fragment plus the expansion parameters, this algorithm will always function perfectly.

9) realistically, you'd want to choose an algorithm that works with the base units of the physical/informational system, not necessarily binary.

this is where you guys are dying. binary is the worst possible way to represent data you want compressed. you need to think about what exactly this system of 'numbers' is, and what its container is able to do, outside of any static, quantized representation.

again, i'd like to stress that i completely understand what you think you're looking at here, and, as simply as possible, to say that i'm not arguing about the fact that you convert yourself into an algorithm, reduce to binary and die since there's no way to map 1 to 10 and 11 at the same time. just conceive of thinking higher than that.. displacement, bases, parameters, quantization. it's a wacky paradigm, but everything is saved.

so, thanks for your work, keep the page if you need to show that there's no way to do this without a lower boundary, but there's no reason to continue propagating untruths about the functionality after the threshold is reached on the most widely accessible encyclopedia on the planet.

2N-2?

Suggested merge

Description

Time

2^N-2?