Talk:Grammar-based code

Computing Stub‑class

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
Stub	This article has been rated as Stub-class on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the project's importance scale.
	This article has been automatically rated by a bot or other tool as Stub-class because it uses a stub template. Please ensure the assessment is correct before removing the `\|auto=` parameter.

I will update the article on grammar-based coding from time to time -- Da-ke.

Article needs to be updated with references about Structured Grammar Based Codes. Aman bhatia 05:34, 11 August 2008 (UTC)[reply]

It would be good to spell out why SLGs are interesting for compression: they can be decoded very, very fast. Constructing the grammar may take a relatively long time, but decoding it is just a depth-first traversal of a DAG and is fast linear time. This makes grammar-based compression attractive for data that may be encoded once but downloaded and/or decompressed many times. (GLZA is the champ for compressing all of Wikipedia well and decompressing it in a big hurry.) — Preceding unsigned comment added by 2602:306:CD5B:FD30:61C0:3B3E:902:22A7 (talk) 02:00, 3 December 2017 (UTC)[reply]

The problem of constructing the "smallest grammar" isn't just intractable... for understanding data compression with entropy coding, it's the wrong problem. The compression-optimal SLG is generally not irreducible, because you may have patterns that repeat infrequently but are made of very frequently-repeating constituents whose entropy codes are therefore short. In such cases, it's worth "spelling out" each repeat in terms of the (short codes for) the constituents, rather than paying the cost of encoding the repeating pattern as a separate rule. See Conrad and Wilson (GLZA) on this.