Scannerless parsing
Appearance
Lexerless parsing (also called "scannerless parsing") refers to the use of a single formalism to express both the lexical and context-free syntax used to parse a language.
Advantages
- Only a single metasyntax is required
- Constructs which are typically lexical in nature are no longer restricted to regular grammars. For example, lexerless parsers can easily deal with nested comments, which cannot be handled by languages with a separate regular-language lexer.
- Context-free languages are closed under composition, meaning that grammars for lexerless parsers can be merged without human intervention. This is not true in theory and almost never true in practice for languages with a separate lexer.
Disadvantages
Unfortunately, the result of composing a lexical and context-free grammar is often no longer strictly context-free. Visser identified five key extensions to classical context-free syntax which handle almost all common non-context-free constructs arising in practice:
- Follow restrictions (a limited form of "longest match")
- Reject productions (a limited form of negative matching, as found in [boolean grammar]s)
- Preference attributes (to handle the "dangling else" construct in C-like languages)
- Associativity attributes
- Priority rules