User:Quale/Recursive-descent parsing

In computer science, recursive descent is a top-down parsing technique that employs mutually-recursive procedures to analyze sentences in a context-free grammar. Each procedure processes one grammatical category, and has a clause for each production rule in the category (nonterminal). The procedures call each other recursively to process the nonterminals in the grammar so the structure of the resulting recursive-descent parser mirrors the organization of the grammar.^[1] Recursive descent is a LL parsing technique, since it processes top-down, left-to-right, producing the leftmost derivation.

A predictive parser is a recursive descent parser that does not require backtracking. Predictive parsing is possible only for the class of LL(k) grammars, which are the context-free grammars for which there exists some positive integer k that allows a recursive descent parser to decide which production to use by examining only the next k tokens of input. The LL(k) grammars therefore exclude all ambiguous grammars, as well as all grammars that contain left recursion. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(k) grammar. A predictive parser runs in linear time.

Recursive descent with backtracking is a more general technique that determines which production to use by trying each production in turn. Recursive descent with backtracking is not limited to LL(k) grammars, but it is not guaranteed to terminate unless the grammar is LL(k). Even when they terminate, parsers that use recursive descent with backtracking may require exponential time. As a consequence, recursive descent with backtracking is usually considered impractical for use in programming language compilers, and the term "recursive descent" is often used to mean a predictive parser.

Recursive descent is a simple parsing technique, but it is sufficient to parse most common computer programming languages and given a suitable grammar or syntax diagrams the parsers are easy to write by hand. Although it is possible to use a tool to generate recursive-descent parsers from a grammar description, most automatically-generated parsers use more powerful techniques. Despite this, recursive-descent parsers are popular because they allow convenient expression of semantic routines that derive meaning from the input and permit issuing high quality error diagnostics for syntactically invalid input.

History and applications

early history, Irons, Foster
Algol 60 and Algol-W
most programming languages in the early 1970s used recursive descent or operator precedence or both
Pascal and subsequent Wirth 1-pass compilers
Cfront used yacc but Stroustrup thinks recursive descent would have been better
lcc, gcc, clang

Technique

Wirth, Algorithms + Data Structures = Programs - from EBNF or syntax diagrams
Burge, Recursive Programming Techniques - backtracking parser written in a functional language
I have at least one more source detailing an explicit algorithm to write an r-d parser from an LL(1) grammar
possible to use explicit stack handling if implementing in a language without recursion such as FORTRAN before Fortran 90, usually not required today

Theory and practice

LL(k), LL(1)
factoring to remove left recursion, may change associativity (EBNF usually gives an easy fix)
substitution
Fraser and Hanson technique to parse expressions that reduces the number of procedures required for languages with many precedence levels such as C

Error recovery

most methods delete symbols to reach a synchronization point

Wirth
Turner
lcc, Fraser and Hanson
good discussion of error recovery in Crafting a Compiler, Fischer and LeBlanc, if only I could find my copy

Comparison to other techniques

Pros

simple, can be written by hand
fast if no backtracking needed
good error messages
easy to attach semantic routines
can handle ambiguity w/ ad hoc code

Cons

some grammars are most naturally expressed using left recursion and must be rewritten to be LL
many context-free grammars are not LL(k) for any finite k, LL(k) grammars are a subset of LR and LR is strictly more powerful
if using a parser generator, LR table-driven techniques can parse more languages

Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by a parser generator, either for an LL(k) language or using an alternative parser, such as LALR or LR. This is particularly the case if a grammar is not in LL(k) form, as transforming the grammar to LL to make it suitable for predictive parsing is involved. Predictive parsers can also be automatically generated, using tools like ANTLR.

Predictive parsers can be depicted using transition diagrams for each non-terminal symbol where the edges between the initial and the final states are labelled by the symbols (terminals and non-terminals) of the right side of the production rule.^[2]

References

^ Burge, W.H. (1975). Recursive Programming Techniques. ISBN 0-201-14450-6.
^ Aho, Alfred V.; Sethi, Ravi; Ullman, Jeffrey (1986). Compilers: Principles, Techniques and Tools (first ed.). Addison Wesley. p. 183.

[1] Burge, W.H. (1975). Recursive Programming Techniques. ISBN 0-201-14450-6.

[2] Aho, Alfred V.; Sethi, Ravi; Ullman, Jeffrey (1986). Compilers: Principles, Techniques and Tools (first ed.). Addison Wesley. p. 183.

[1]

[2]