Talk:Comparison of parser generators

This article has not yet been rated on Wikipedia's content assessment scale.
It is of interest to the following WikiProjects:

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Computer science List‑class Mid‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

List

This article has been rated as List-class on Wikipedia's content assessment scale.

Mid

This article has been rated as Mid-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Computing: Software

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
???	This article has not yet received a rating on the project's importance scale.
	This article is supported by WikiProject Software (assessed as Mid-importance).

"Confusing"

I marked the article as confusing because I made a parser generator and would like to add it to the list, but some of the columns are so unclear that I'm not sure what to put in them!

I suspect "Grammar, code" answers the question "Are actions associated with the grammar mixed or separate from the grammar?" right?
I imagine that the "Lexer" column is supposed to indicate whether the parser generator generates lexers and parsers or just parsers, but perhaps it would make more sense to have two "yes/no" columns, "generates parsers?" and "generates lexers?". As it is, there are three values, "none", "generated" and "external" and I have no idea what the difference is between "generated" and "external".
I suppose that "IDE" either means "the parser generator has its own dedicated IDE" or "the parser generator integrates with one or more IDEs". Certainly this should not be a yes/no column: it should say whether the tool has a dedicated IDE, and if it integrates with an IDE it should specify which IDE.
The "input grammar notation" seems weird in that ANTLR is marked "EBNF", but ANTLR syntax is definitely different than the EBNF described on Wikipedia. Since ANTLR is so popular I'm inclined to say that we simply refer to the notation as simply "ANTLR".
The "parsing algorithm" seems to conflate two different issues: the input grammar type (e.g. LALR(1), LL(k), PEG) versus the parsing technique (e.g. table-driven, runtime-stack recursive-descent, packrat). In the "Deterministic context-free languages" section, "parsing algorithm" seems to state the input grammar type, but in the "Parsing expression grammars" section, the grammar type is presumably "PEG" so this column specifies the parsing technique instead. Perhaps it's reasonable to have both in a single column, but let's be clear that grammar type and parsing algorithm are two different things.

I request that someone who understands these tables well should clarify them by using better column names (e.g. IDE integration instead of just "IDE") and by adding explanatory paragraphs or bullet points above or below the table. I'd also suggest merging the "Deterministic context-free languages" and "Parsing expression grammars" since both are used for parsing computer languages. I'm guessing that "General context-free, conjunctive or boolean languages" is more for natural language processing so it's probably reasonable to have a separate table; in any case, it is non-obvious what the difference between the sections are; descriptive text and links to other Wikipedia articles are needed. Finally, I think that either (A) there should an extra column for special features of the parser generator, e.g. ANTLR's gated predicates or tree parsing, which go "above and beyond" the capabilities of grammar types like LL(k) in the theoretical literature, or (B) the "grammar type" column could mention the special features (e.g. LL(*) + syntactic/semantic predicates). 66.11.82.73 (talk) 23:18, 19 March 2014 (UTC)[reply]

Perl Parse::RecDescent

Where would Perl's Parse::RecDescent be in those tables? I'd add it but I'm not confident enough in my compilers chops. It is documented at http://search.cpan.org/~jtbraun/Parse-RecDescent-1.967009/lib/Parse/RecDescent.pm

JidGom (talk) 17:32, 4 December 2013 (UTC)[reply]

What is LR(*)?

Can any expert in the field confirm that "LR(*)" is a valid term? It is mentioned as a parsing algorithm for the "Hime Paser Generator". Rasha ROR 1 (talk) 13:20, 26 March 2013 (UTC)[reply]

Merge

Due to the fact that the article now contains also a list of parser generators for regular languages, I suggest to merge the article with List of C Sharp lexer generators. --Hyperyl (talk) 22:54, 24 December 2008 (UTC)[reply]

I think the other article is much more objective than this 189.89.156.99 (talk) 21:08, 3 May 2010 (UTC)[reply]

Done--ebraminio^talk 09:23, 18 December 2010 (UTC)[reply]

Old Discussion

This discussion page needs to be cleaned up, the mentioned issues were resolved!--Hyperyl 13:49, 11 November 2007 (UTC)[reply]

Article Content

Let's decide on what information should be in the article - if a description the different columns is needed and if the two charts should be combined.--DevinCook 13:43, 11 November 2007 (UTC)[reply]

Yes, please integrate an improved version of the old descriptions. But keep the current table, it was a lot of work to edit it!--Hyperyl 13:49, 11 November 2007 (UTC)[reply]

I'm sure it took a ton of time to edit it! I do think that the license columns need to state Open Source/Freeware/etc... Only a few programmers know the details of each license on quick inspection. Also, the term "free software" - to spite it origins in advocacy of ownerless code - is not a good term anymore. It, nowadays, just refers to the "open source + public domain". I admit that that's not good for the movement, but its the reality of the art. --DevinCook 13:55, 11 November 2007 (UTC)[reply]

Free Software licenses use copyright to protect the software thus they are not public domain. Open Source and Free Software requirements are the same. The code is not ownerless Open Source is just a marketing term. Please inform yourself! Have a look at the GNU philosophy pages!--Hyperyl 14:02, 11 November 2007 (UTC)[reply]

I am very familiar with the terminology. Free software is more a philosophy nowadays rather than than an actual licensing term. The open source community worries about the implementation of source available software while the Free Software community is more concerned with ethical issues. Open source has been extremely successful - in particular Linux. To use an old Gnu quote: "For the Open Source movement, the issue of whether software should be open source is a practical question, not an ethical one. As one person put it, 'Open source is a development methodology; free software is a social movement.' For the Open Source movement, non-free software is a suboptimal solution. For the Free Software movement, non-free software is a social problem and free software is the solution." . Whether one is a Stallman-ist or a Torvalds-ist is the big question. For the most part, Torvalds has prevailed and "free software" is basically seen as an absolute form of "open source". --DevinCook 14:31, 11 November 2007 (UTC)[reply]

Well, I am a Stallman-ist. There is a big difference between us: You write proprietary software and I reject it. So there will be no consensus between us, because you prefer restriction and try to subjugate the users and I prefer freedom and try to protect the users.

But have a look at:

--Hyperyl 15:32, 11 November 2007 (UTC)[reply]

Freeware vs. Free Software

I suggest that we keep the term "freeware" to describe the software available without cost. "Free software" is, for the most part, synonymous with "open source" - which is a logical subset of freeware.

Not all the software listed is open source. --DevinCook 23:59, 11 July 2007 (UTC)[reply]

Thus it is Proprietary Software and should be included in the appropriate section. Open Source is a marketing term with a different intention than Free Software. Open Source focuses on technical aspects and Free Software on philosophical problems and thus represents philosophical values. I think these values are far more important than technical aspects.

Additionally Freeware is gratis Proprietary Software. Thus it is not Free Software (Open Source Software). Thus Free Software programmes cannot be Freeware, because they respect our freedom instead of restricting us.

If a programme is proprietary and generates source code that is licensed under Free Software License, it should be listed in the Proprietary Software section, because it still restricts the us and does not respect our freedom.

Commercial means charging money for the programme or a service related to the programme. Proprietary and Free Software can be commercial, therefore the headline makes no sense.

I reverted your revisions due to the reasons mentioned above. Please inform yourself about the things you are writing about. —Preceding unsigned comment added by Hyperyl (talk • contribs) 21:32, 28 October 2007 (UTC)[reply]

hmm - that sort of remark belongs on slashdot, not in a neutral, informative context. Tedickey 21:52, 28 October 2007 (UTC)[reply]

Best to dispense with the labels and simply cite what license applies. That's neutral, and will only offend the people who want to claim that other people belong to their group Tedickey 21:58, 28 October 2007 (UTC)[reply]

Yes this is a good solution!--Hyperyl 22:11, 28 October 2007 (UTC)[reply]

The parser generators were originally listed in one chart. I broke it into too to allow developers to see which ones were commercial and those available for free. As the revision stands now, it isn't that helpful for users. GOLD is not commerical. In fact, most of the items listed as "free software" are not what is implied by "free software". One possible solution is to once-again combine the two tables into one.--DevinCook 07:46, 29 October 2007 (UTC)[reply]

I made some changes that should provide a good compromise.--DevinCook 07:46, 29 October 2007 (UTC)[reply]

Your argumentation makes no sense: Free Software is software that respects the user's freedom, all the listed programmes do and therefore are Free Software. If you want to filter commercial software (software that is not gratis), you could simply sort by license or add a price field. The section division you did makes no sense at all you put Proprietary and Free Software into one section. This is unethical. Please merge the list as it has been democratically (based on consensus) decided. I think this is a good compromise (like in nearly all other software lists). Everyone will be happy. Maybe you could put the Proprietary programmes at the end of the list, so it is clearly distinguishable from Free Software. But this is of course optional. I would be also happy with a alphabetically sorted list.--Hyperyl 18:57, 29 October 2007 (UTC)[reply]

Free software is a logical subset of open source software. The section is designed to list software titles that the user can use without cost. Sadly, the terminology that we all use is a tad of a nightmare - and the cause of this debate. "Free software" is free, but not all free software titles are "free software". They main difference between "free software" and "freeware" is whether the source is open. By definition, "freeware" is proprietary. However, the divisions were based on commercial vs. free rather than source code availability.--DevinCook 19:25, 29 October 2007 (UTC)[reply]

Devin, this is my last attempt to explain to you what Free Software is. By definition Free Software and Open Source Software are the same. Their definitions describe the four essential freedoms. Open Source is a marketing term that was coined because business people confused Free Software with Open Source. Thus Free Software is not a subset of Open Source - it is equal to it. You shouldn't divide software in to cost categories. This is useless. A gratis copy of a Proprietary programme is still unethical and useless (because you are not free and can not properly use (in terms of usage, education, collaboration and development) the software. Freeware is gratis Proprietary Software and Free Software is not required to be gratis. Freedom is the difference and the source code and attributed rights are a requirement for this freedom. You shouldn't define free as gratis. It is free as in Freedom not as in price. Free Software licenses never make any requirements about price. Their purpose is freedom which is much more valuable than money. If you still think money is a more important than freedom, I cannot help you, because you like to be a slave, you like to be subjugated. So please correct your mistakes!--Hyperyl 23:02, 29 October 2007 (UTC)[reply]

By the way - why do you act as if you're an authority on the matter? Tedickey 23:59, 29 October 2007 (UTC)[reply]

Definitions

Q: How is here the Definition of Grammar/code: Mixed or Separated. Separated from Code (without Code inside) or as independent File? 129.69.189.32 20:56, 12 October 2006 (UTC)[reply]

Explanation

This chart needs much more explanation. I.e. adding wiki-links to LALR, etc, what does "sepperated" and "mixed" mean (per above), etc, etc, etc... —The preceding unsigned comment was added by 71.70.210.188 (talk) 00:09, 11 December 2006 (UTC).[reply]

Misleading article title

Hi, I propose to change the title of this article to List of Parser Generators. This article is about parser generators and not about parsers, isn't it?eboy 15:55, 31 August 2007 (UTC)[reply]

Excellent point. Originally, the article was titled "List of Compiler-Compilers", however, many items on the list are not compiler-compilers, including my project. I do agree that changing the title to "List of Parser Generators" or perhaps "List of Parser Development Software", would be useful. What is everyone's thoughts on the issue? --DevinCook 20:59, 1 September 2007 (UTC)[reply]

Agreed, I like "List of Parser Generators" best. // Sping 20:24, 14 September 2007 (UTC)[reply]

Column with external link

In list are generators having Wiki page (ANTLR), having extenral link (ACCENT) and no having Wiki and external link (Grammatica). My proposal is new column with extenral link; this givs advantages:

direct go to generator home page, irrespective if generator has Wiki page or not

instantly can be seen (by color) if generator has Wiki page or not

--Borneq 08:11, 18 October 2007 (UTC)[reply]

Remove website links

I suggest to remove the links to the websites of the parser generators, because it looks ugly and duplicates the efforts of the article pages (if present) which already list it.

So I propose to link the names of the parser generator that don't have a Wikipedia article to their websites. —Preceding unsigned comment added by Hyperyl (talk • contribs) 23:01, 24 December 2008 (UTC)[reply]

Sounds like a good idea. Most of the data on this page is from non-notable programs, mostly added by their respective developers Tedickey (talk) 17:59, 15 May 2010 (UTC)[reply]

non-notable Waxeye program

Recent edits have added "Waxeye", which (see google "waxeye parser") appears to be non-notable. Tedickey (talk) 10:27, 30 December 2008 (UTC)[reply]

For context, most of the google hits on "waxeye" itself refer to a bird. The program has a few hundred hits, which is a few orders of magnitude less common than one would expect for a Wikipedia topic. Tedickey (talk) 10:28, 30 December 2008 (UTC)[reply]

Is Yacc++ really LR(k)/LALR(k)?

How sure are you that Yacc++ is LR(k)/LALR(k). k seems to imply that it has unbounded lookahead, but as far as I can tell from the website's overview page, Yacc++ is LR(1)/LALR(1). Errantkid (talk) 21:47, 28 January 2009 (UTC)[reply]

I'm changing it. Please if someone disagrees let us know Errantkid (talk) 21:09, 11 June 2009 (UTC)[reply]

It actually is LR(k) and there are two ways of achieving that functionality.

First, when doing lookahead calculations the parser can actually consult a non-terminal and not just a terminal symbol, if the non-terminal represents something of unbounded length, the lookahead that it resolves is unbounded length--this is essentially a form of backtracking (and depends upon the fact that our implementation of the parser "stack" is actually more akin to a deque or maybe a pair of deques). This method is somewhat experimental and needs a command line option to enable and we have never found it useful in practice, e.g. we have never found a non-toy grammar that would compile with it turned on that wouldn't compile with normal techniques--it didn't magically resolve all the conflicts in our C++ grammar. I've always meant to write a technical article describing the technique, but my day-jobs have never left such time--no one can make a living writing parser generators.

Second, the user can also apply predicates anywhere within a rule, and again those predicates are regular expressions of non-terminals, which essentially allow the user to given an LR(k) language as the lookahead at that particular point in the grammar--this is essentially a parsing expression grammar (PEG) feature, and works similar to the mechanism in ANTLR, where we borrowed the idea from, but did it using LR rather than LL techniques, using a variation on the idea of "tunnel grammars" that Josef Grölsch invented for Cocktail. Again, it is essentially a backtracking method, but in this case the user specifies exactly where it should occur in the parsing process, so you can bound its impact, as all backtracking techniques often defeat the linear promise of LR (and LL) parsing. Still, this is practical and someone, Boris Burshtien (sp?) or perhaps Quinn Tyler-Jackson, once showed how one could use the technique to emulate a Turing Machine, so it is also completely general.

93.40.177.50 (talk) 10:12, 1 January 2017 (UTC) Christopher F Clark[reply]

LL(1) = Recursive Descent

I think it would be the best to replace all "LL(1)"s and "Recursive Descent"s with LL(1). --Chricho (talk) 20:22, 9 March 2010 (UTC)[reply]

GLR is for CFGs

Should CFG be replaced with “deterministic CFG”? --Chricho (talk) 18:47, 3 December 2010 (UTC) However, there is an Earley-parser generator which can obviously handle non-deterministic CFGs, so I think it should be put together with the GLR generators. I think we can ignore the fact that backtracking-parsers can also handle non-deterministic/ambiguous grammars. Is there any special reason for the separation of Packrat-parser-generators? The grammars they can handle are a superset of LL(1), but not a superset of LR(1) or LL(k), so I do not think they play in another league. --Chricho (talk) 18:54, 3 December 2010 (UTC) Changed… --Chricho (talk) 19:21, 3 December 2010 (UTC)[reply]

What evidence do you have that Packrat-parser generators are a subset of LR(1)? Backtracking parsers (using predicates) have been shown to emulate Turing Machines, so I would assume that PEGs could do the same.

93.40.177.50 (talk) 10:33, 1 January 2017 (UTC) Christopher F Clark[reply]

Redlink information

Redlink cleanup removed the following information contained in the link name or non-existent article title. It might be reintegrated, for example, the fact that Narwhale is parser generator, Packrat is a library, and what GDK stands for.

Acronyms

Grammar Deployment Kit|GDK
LALR Parser Generator|LPG
Lexical Analyzer and Parser Generator|Lapg

Libraries

Parsing Expression Grammar Template Library|PEGTL
Parsnip Parser Library|Parsnip
Frisby (library)|Frisby
Packrat (library)|Packrat

Lexers

Turbo Pascal Lex/Yacc|TP Yacc
Alex (lexer generator)|Alex
Dolphin (lexer generator)|Dolphin
DFA (lexer generator)|LRSTAR

Parsers

Msta parser|Msta
Laja (parser and code generator)|Laja
Laja (parser and code generator)|Laja
APG (parser generator)|APG
Aurochs (parser generator)|Aurochs
Austenx (parser generator)|AustenX
AustenX (parser generator)|AustenX
Beaver (parser generator)|Beaver
CSP (parser generator)|CSP
CUP (parser generator)|CUP
Dragon (parser generator)|Dragon
Dypgen (parser generator)|Dypgen
eli (parser generator)|eli
Elkhound (parser generator)|Elkhound
jay (parser generator)|jay
Lime (parser generator)|Lime
LISA (parser generator)|LISA
Menhir (parser generator)|Menhir
Monkey (parser generator)|Monkey
Narwhal (parser generator)|Narwhal
peg (parser generator)|peg
SLK (parser generator)|SLK
SPARK (parser generator)|SPARK
Styx (parser generator)|Styx
Tap (parser generator)|Tap
Tom (parser generator)|Tom
Waxeye (parser generator)|Waxeye
Whale (parser generator)|Whale
Wisent (parser generator)|Wisent
Yard (parser generator)|Yard
Happy (Parser)|Happy
Parsec (parser)|Parsec
Sprache (parser)|Sprache

Programming Language and software

Katadhin (programming language)|Katahdin
Essence (software)|Essence
Treetop (software)|Treetop

— Cpiral Cpiral 16:51, 8 September 2013 (UTC)[reply]

That's problematic: most of the content for this topic lists nonnotable programs of little interest to anyone except those who added the external links (likely their developers). TEDickey (talk) 17:24, 8 September 2013 (UTC)[reply]

There are some parser generators I would consider important in your list.

LALR (and its descendent LRSTAR) by Paul Mann was one of the most popular parser generators on PC's in the 1980's and 90's.

ELI was (is?) a very extensive compiler-compiler system developed by a consortium of universities, including the University of Colorado. It is one of the few systems that I know of that attempt to have notations that solve other parts of the compiler construction problem beyond lexing and parsing.

Elkhound, Menhir, Happy, and Parsec also had (have?) adherents beyond their initial author if I recall correctly.

93.40.177.50 (talk) 11:15, 1 January 2017 (UTC) Christopher F Clark[reply]

PEG.js is not packrat

Packrat has O(N) execution time, but with PEG.js you can easily verify O(N^2) execution time with the grammar:

S = ( X / . )*
X = "a" [ab]* "b"

Running this with cache enabled with a long string of "a" characters will give O(N^2) execution time. There is a misconception that any recursive descent parser with memoization is a packrat parser. That's not the case: a packrat parser achieves O(N) execution time by memoizing the result of every subexpression at every character position, including recursively unrolling repetition operators. PEG.js merely caches the result of the entire rule X.

Probably a lot of the parsers labelled "packrat" in this table are not really packrat, but I will just change the PEG.js one for now. I will call it a recursive descent parser, which is a reasonable approximation, it's also the term the author uses [1] -- Tim Starling (talk) 00:53, 4 June 2015 (UTC)[reply]

OMeta is not packrat

Along the same lines as what Tim Starling writes above, I have confirmed that OMeta is also not fully packrat. CF 2.2.3 "A Note on Memoization" on page 41 of http://www.vpri.org/pdf/tr2008003_experimenting.pdf . It is currently labeled in the table as "Packrat (modified)". I propose that we standardize on a parenthetical, such as "(selective)" which is also used to describe several other parser generators, to describe these common variants on packrat parsing, rather than fall all the way back to the generic "Recursive descent". C. Scott Ananian (talk) 17:28, 18 June 2015 (UTC)[reply]

do they all lack a GUI (Graphical User Interface) ?

I think that modern parser generators should include a GUI, this way it would be so easier to understand the parsing (and the parser generation) algorithms, thus allowing to create better parser more easily.

So the big question is : do they ALL lack a (good, intuitive, helping) GUI ???

78.227.78.135 (talk) 21:54, 9 November 2015 (UTC)[reply]

for example : https://github.com/akshayhebbarys/lalr-parser the GUI is quite bad ... they have only 4 examples, none of which being simply an arthmetic expression parser... we are in 2015, and the tools nearly didn't evolve since 1990 (bison) !!!! 78.227.78.135 (talk) 23:54, 9 November 2015 (UTC)[reply]

Most parser generators like a "good" (or any) GUI. However, I have from normally reliable sources that the one with Visual Parse++ was good. Unfortunately, that parser generator (and its GUI) are no longer available. I've been in contact with the author, but was never able to revive the product. The cost of creating (and maintaining) a parser generator, beyond creating one of "student" quality, is simply commercially prohibitive and the market vanishingly small.

93.40.177.50 (talk) 10:28, 1 January 2017 (UTC) Christopher F Clark[reply]