Jump to content

User:Quale/Recursive-descent parsing

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Quale (talk | contribs) at 05:15, 5 May 2016 (initial outline intended to augment recursive descent parser). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

In computer science, recursive descent is a top-down parsing technique that employs mutually-recursive procedures to analyze sentences in a context-free grammar. Each procedure processes one grammatical category, and has a clause for each production rule in the category (nonterminal). The procedures call each other recursively to process the nonterminals in the grammar so the structure of the resulting recursive-descent parser mirrors the organization of the grammar.[1] Recursive descent is a LL parsing technique, since it processes top-down, left-to-right, producing the leftmost derivation.

A predictive parser is a recursive descent parser that does not require backtracking. Predictive parsing is possible only for the class of LL(k) grammars, which are the context-free grammars for which there exists some positive integer k that allows a recursive descent parser to decide which production to use by examining only the next k tokens of input. The LL(k) grammars therefore exclude all ambiguous grammars, as well as all grammars that contain left recursion. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(k) grammar. A predictive parser runs in linear time.

Recursive descent with backtracking is a more general technique that determines which production to use by trying each production in turn. Recursive descent with backtracking is not limited to LL(k) grammars, but it is not guaranteed to terminate unless the grammar is LL(k). Even when they terminate, parsers that use recursive descent with backtracking may require exponential time. As a consequence, recursive descent with backtracking is usually considered impractical for use in programming language compilers, and the term "recursive descent" is often used to mean a predictive parser.

Recursive descent is a simple parsing technique, but it is sufficient to parse most common computer programming languages and given a suitable grammar or syntax diagrams the parsers are easy to write by hand. Although it is possible to use a tool to generate recursive-descent parsers from a grammar description, most automatically-generated parsers use more powerful techniques. Despite this, recursive-descent parsers are popular because they allow convenient expression of semantic routines that derive meaning from the input and permit issuing high quality error diagnostics for syntactically invalid input.

History and applications

  • early history, Irons, Foster
  • Algol 60 and Algol-W
  • most programming languages in the early 1970s used recursive descent or operator precedence or both
  • Pascal and subsequent Wirth 1-pass compilers
  • Cfront used yacc but Stroustrup thinks recursive descent would have been better
  • lcc, gcc, clang

Technique

  • Wirth, Algorithms + Data Structures = Programs - from EBNF or syntax diagrams
  • Burge, Recursive Programming Techniques - backtracking parser written in a functional language
  • I have at least one more source detailing an explicit algorithm to write an r-d parser from an LL(1) grammar
  • possible to use explicit stack handling if implementing in a language without recursion such as FORTRAN before Fortran 90, usually not required today

Theory and practice

  • LL(k), LL(1)
  • factoring to remove left recursion, may change associativity (EBNF usually gives an easy fix)
  • substitution
  • Fraser and Hanson technique to parse expressions that reduces the number of procedures required for languages with many precedence levels such as C

Error recovery

most methods delete symbols to reach a synchronization point

  • Wirth
  • Turner
  • lcc, Fraser and Hanson
  • good discussion of error recovery in Crafting a Compiler, Fischer and LeBlanc, if only I could find my copy

Comparison to other techniques

Pros

  • simple, can be written by hand
  • fast if no backtracking needed
  • good error messages
  • easy to attach semantic routines
  • can handle ambiguity w/ ad hoc code

Cons

  • some grammars are most naturally expressed using left recursion and must be rewritten to be LL
  • many context-free grammars are not LL(k) for any finite k, LL(k) grammars are a subset of LR and LR is strictly more powerful
  • if using a parser generator, LR table-driven techniques can parse more languages

Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by a parser generator, either for an LL(k) language or using an alternative parser, such as LALR or LR. This is particularly the case if a grammar is not in LL(k) form, as transforming the grammar to LL to make it suitable for predictive parsing is involved. Predictive parsers can also be automatically generated, using tools like ANTLR.

Predictive parsers can be depicted using transition diagrams for each non-terminal symbol where the edges between the initial and the final states are labelled by the symbols (terminals and non-terminals) of the right side of the production rule.[2]

References

  1. ^ Burge, W.H. (1975). Recursive Programming Techniques. ISBN 0-201-14450-6.
  2. ^ Aho, Alfred V.; Sethi, Ravi; Ullman, Jeffrey (1986). Compilers: Principles, Techniques and Tools (first ed.). Addison Wesley. p. 183.