Shift-reduce parser

In computer science, bottom-up parsing is any method of discerning the hierarchical structure and meaning of a linear input text, by identifying and processing its lowest-level small details first, before its mid-level structures, and leaving the highest-level overall structure to last.

Bottom-up Versus Top-down

The bottom-up name comes from the concept of a parse tree, in which the most detailed parts are at the bushy bottom of the (upside-down) tree, and larger structures composed from them are in successively higher layers, until at the top or "root" of the tree a single unit describes the entire input stream. A bottom-up parse discovers and processes that tree starting from the bottom left end, and incrementally works its way upwards and rightwards. A parser may act on the structure hierarchy's low, mid, and highest levels without ever creating an actual data tree; the tree is then merely implicit in the parser's actions. Bottom-up parsing lazily waits until it has scanned and parsed all parts of some construct before committing to what the combined construct is.

The opposite of this are top-down parsing methods, in which the input's overall structure is decided (or guessed at) first, before dealing with mid-level parts, leaving the lowest-level small details to last. A top-down parse discovers and processes the hierarchical tree starting from the top, and incrementally works its way downwards and rightwards. Top-down parsing eagerly decides what a construct is much earlier, when it has only scanned the leftmost symbol of that construct and has not yet parsed any of its parts.

If a language grammar has multiple rules that may start with the same leftmost symbols but have different endings, then that grammar can be efficiently handled by a deterministic bottom-up parse but cannot be handled top-down without guesswork and backtracking. So bottom-up parsers handle a somewhat larger range of computer language grammars than do deterministic top-down parsers.

Type of bottom-up parsers

It is common for bottom-up parsers to take the form of general parsing engines, which can either parse, or generate a parser for, a specific programming language given a specification of its grammar. Perhaps the most well known generalized parser generators are YACC and GNU bison.

Bottom-up parsing is sometimes done by backtracking or by recursive ascent parsers. But much more commonly, bottom-up parsing is done by a table-driven shift-reduce parser. Shift-reduce parsers have varied methods to decide when they have found the right end of a complete instance of some grammar rule.

The common classes of shift-reduce parsers are:

LR parser
- LR(0) - No lookahead symbol
- SLR(1) - Simple with one lookahead symbol. Used by XPL. Less powerful than LALR.
- LALR(1) - Lookahead bottom up, not as powerful as full LR(1) but simpler to implement. YACC deals with this kind of grammar.
- LR(1) - Most general grammar, but most complex to implement.
- LR( $n$ ) - (where $n$ is a positive integer) indicates an LR parser with $n$ lookahead symbols; while grammars can be designed that require more than 1 lookahead, practical grammars try to avoid this because increasing $n$ can theoretically require exponentially more code and data space (in practice, this may not be as bad). Also, the class of LR( $n$ ) languages is the same as that of LR(1) languages.
Precedence parsers
- Operator-precedence parser, a very simple numerical method that works for expressions but not general program syntax.
- Simple precedence parser, used in PL360. Does not handle common programming languages.
- Extended precedence parser
- Mixed Strategy Precedence parser, used by the original version of XPL. Less powerful than SLR.

Shift-reduce parsers

The most common bottom-up parsers are the shift-reduce parsers. These parsers examine the input tokens and either shift (push) them onto a stack or reduce elements at the top of the stack, replacing a right-hand side by a left-hand side.

Action table

Often an action (or parse) table is constructed which helps the parser determine what to do next. The following is a description of what can be held in an action table.

Actions

Shift - push token onto stack
Reduce - remove handle from stack and push on corresponding nonterminal
Accept - recognize sentence when stack contains only the distinguished symbol and input is empty
Error - happens when none of the above is possible; means original input was not a sentence

Shift and reduce

A shift-reduce parser uses a stack to hold the grammar symbols while awaiting reduction. During the operation of the parser, symbols from the input are shifted onto the stack. If a prefix of the symbols on top of the stack matches the right-hand side (RHS) of a grammar rule which is the correct rule to use within the current context, then the parser reduces the RHS of the rule to its left-hand side (LHS), replacing the RHS symbols on top of the stack with the nonterminal occurring on the LHS of the rule. This shift-reduce process continues until the parser terminates, reporting either success or failure. It terminates with success when the input is legal and is accepted by the parser. It terminates with failure if an error is detected in the input.

The parser is a stack automaton which is in one of several discrete states. In reality, in the case of LR parsing, the parse stack contains states, rather than grammar symbols. However, since each state corresponds to a unique grammar symbol, the state stack can be mapped onto the grammar symbol stack mentioned earlier.

Algorithm: Shift-reduce parsing

Start with the sentence to be parsed as the initial sentential form
Until the sentential form is the start symbol do:
1. Scan through the input until we recognise something that corresponds to the RHS of one of the production rules (this is called a handle)
2. Apply a production rule in reverse; i.e., replace the RHS of the rule which appears in the sentential form with the LHS of the rule (an action known as a reduction)

In step 2.1 above we are "shifting" the input symbols to one side as we move through them; hence a parser which operates by repeatedly applying steps 2.1 and 2.2 above is known as a shift-reduce parser.

A shift-reduce parser is most commonly implemented using a stack, where we proceed as follows:

start with an empty stack
a "shift" action corresponds to pushing the current input symbol onto the stack
a "reduce" action occurs when we have a handle on top of the stack. To perform the reduction, we pop the handle off the stack and replace it with the nonterminal on the LHS of the corresponding rule.

Example

Take the grammar:

 Sentence   --> NounPhrase VerbPhrase
 NounPhrase --> Art Noun
 VerbPhrase --> Verb | Adverb Verb
 Art        --> the | a | ...
 Verb       --> jumps | sings | ...
 Noun       --> dog | cat | ...

And the input:

the dog jumps

Then the bottom up parsing is:

Stack                  Input Sequence
()                     (the dog jumps)
(the)                  (dog jumps)      SHIFT word onto stack
(Art)                  (dog jumps)      REDUCE using grammar rule
(Art dog)              (jumps)          SHIFT..
(Art Noun)             (jumps)          REDUCE..
(NounPhrase)           (jumps)          REDUCE
(NounPhrase jumps)     ()               SHIFT
(NounPhrase Verb)      ()               REDUCE
(NounPhrase VerbPhrase)()               REDUCE
(Sentence)             ()               SUCCESS

External links

Parsing Simulator This simulator is used to generate examples of shift-reduce parsing
An example of shift-reduce parsing (which is a type of bottom up parsing), with a small grammar, state diagram, and C language code to implement the parser
Course notes on shift reduce parsing
A good non-technical tutorial in the context of natural (human) languages ^{[dead link]} (Instead use an archived version of the page)
A discussion of shift-reduce conflicts in bottom up parsers. A knowledgeable but technical article.
Yet another bottom-up parsing illustration