Talk:LZ4 (compression algorithm)

Computing Start‑class Mid‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-importance on the project's importance scale.

Removals

[Copied from User talk:Intgr#Deletion from LZ4]

This is a small matter. As it happens, I don't support your removal of the statement "In a worst-case scenario, incompressible data gets increased by 0.4%." (and sketchy source) from the LZ4 article.

By design, to quote the entire incompressible byte stream adds a quoting overhead of one 0xFF length byte associated with every 255 bytes of uncompressed data. It's slightly more OR than making an unsourced claim that wrapping quote marks around an N character string increases the representation to N+2 characters.

By taking this statement out, you are catering to the common misunderstanding that there is such a thing as a compression algorithm which only ever makes the object smaller. By far this is the more severe of the two evils.

I also don't support this removal (one-time-only IP editor):

https://en.wikipedia.org/w/index.php?title=LZ4_%28compression_algorithm%29&diff=prev&oldid=653088035

Your call on both issues. I'm not going to wade in with my own edits. — MaxEnt 03:13, 20 April 2015 (UTC)[reply]

@MaxEnt: I copied this discussion here from my talk page so other people interested in the article can chime in and/or improve it.

The first edit that MaxEnt is talking about is this. Admittedly I didn't do a good job at explaining that change in the edit comment.

I mainly object to the material removed in both edits on sourcing grounds; if there were good sources supporting the claims, I would be all for adding them back. But usually blogs are not considered reliable sources.

I also tried repeating this on my own and could not reproduce the 0.4% overhead. I tried tons of times, but always arrived at the same result:

% dd if=/dev/urandom bs=100000 count=1 |lz4 |wc -c
1+0 records in
1+0 records out
100000 bytes (100 kB) copied, 0.00602624 s, 16.6 MB/s
100019

Am I missing something? That's a 0.02% overhead, and it's even lower for larger input sizes. -- intgr [talk] 08:53, 20 April 2015 (UTC)[reply]

Dear intgr, what you are missing is that you measured 0.02% for one specific case. That specific case is not the "worst-case scenario". --DavidCary (talk) 09:13, 30 May 2015 (UTC)[reply]