Jump to content

Scannerless parsing

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by AshtonBenson (talk | contribs) at 01:03, 25 November 2005. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Lexerless parsing (also called "scannerless parsing") refers to the use of a single formalism to express both the lexical and context-free syntax used to parse a language.

Advantages

  • Only a single metasyntax is required
  • Constructs which are typically lexical in nature are no longer restricted to regular grammars. For example, lexerless parsers can easily deal with nested comments, which cannot be handled by languages with a separate regular-language lexer.
  • Context-free languages are closed under composition, meaning that grammars for lexerless parsers can be merged without human intervention. This is not true in theory and almost never true in practice for languages with a separate lexer.

Disadvantages

Unfortunately, the result of composing a lexical and context-free grammar is often no longer strictly context-free. Visser identified five key extensions to classical context-free syntax which handle almost all common non-context-free constructs arising in practice:

  • Follow restrictions (a limited form of "longest match")
  • Reject productions (a limited form of negative matching, as found in [boolean grammar]s)
  • Preference attributes (to handle the "dangling else" construct in C-like languages)
  • Associativity attributes
  • Priority rules

Visser, E. (1997b). Scannerless generalized-LR parsing. Technical Report P9707, Programming Research Group, University of Amsterdam