Jump to content

Recursive ascent parser

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Djspiewak (talk | contribs) at 15:29, 24 March 2009. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Recursive ascent parsing is a technique for implementing an LALR parser which uses mutually-recursive functions rather than tables. Thus, the parser is directly encoded in the host language similar to recursive descent. Direct encoding usually yields a parser which is faster than its table-driven equivalent[1] for the same reason that compilation is faster than interpretation. It is also (nominally) possible to hand edit a recursive ascent parser, whereas a table-implementation is nigh unreadable to the average human.

Recursive ascent was first described by Thomas Penello in his article "Very fast LR parsing". in 1986. He was not intending to create a hand-editable implementation of an LR parser, but rather a maintainable and highly performant parser implemented in assembly language. The technique was later expounded upon by G.H. Roberts[2] in 1988 as well as Kruseman Aretz[3] in 1992 as part of the textbook, Theoretical Computer Science also by Leermakers and Augusteijn. An extremely readable description of the technique was written by Morell and Middleton[4] in 2003.

Summary

Intuitively, recursive ascent is a literal implementation of the LR parsing concept. Each function in the parser represents a single LR automaton state. Within each function, a multi-branch statement is used to select the appropriate action based on the current token popped off the input stack. Once the token has been identified, action is taken based on the state being encoded. There are two different fundamental actions which may be taken based on the token in question:

  • Shift - Encoded as a function call, effectively jumping to a new automaton state.
  • Reduce - Encoded differently according to the semantic action routine for the relevant production. The result of this routine is wrapped in an ADT which is returned to the caller. The reduce action must also record the number of tokens which were shifted prior to the reduce, passing this value back to the caller along with the reduce value. This shift counter determines at which point up the call stack the reduce should be handled.

There is also a third LR automaton action which may be taken in a given state, but only after a reduce where the shift counter has decremented to zero (indicating that the current state should handle the result). This is the goto action, which is essentially a special case of shift designed to handle non-terminals in a production. This action must be handled after the multi-branch statement, since this is where any reduction results will "resurface" from farther down the call stack.

Example

Consider the following grammar in bison syntax:

expr : expr '+' term   { $$ = $1 + $3; }
     | expr '-' term   { $$ = $1 - $3; }
     | term            { $$ = $1; }
     ;

term : '(' expr ')'    { $$ = $2; }
     | num             { $$ = $1; }
     ;

num : '0'              { $$ = 0; }
    | '1'              { $$ = 1; }
    ;

This grammar is LR(0) in that it is left-recursive (in the expr non-terminal) but does not require any lookahead. Recursive ascent is also capable of handling grammars which are LALR(1) in much the same way that table-driven parsers handle such cases (by pre-computing conflict resolutions based on possible lookahead).

The following is a Scala implementation of a recursive ascent parser based on the above grammar:

object ExprParser {
  private type Result = (NonTerminal, Int)
  
  private sealed trait NonTerminal {
    val v: Int
  }
  
  private case class NTexpr(v: Int, in: Stream[Char]) extends NonTerminal
  private case class NTterm(v: Int, in: Stream[Char]) extends NonTerminal
  private case class NTnum(v: Int, in: Stream[Char]) extends NonTerminal
  
  class ParseException(msg: String) extends RuntimeException(msg) {
    def this() = this("")
    
    def this(c: Char) = this(c.toString)
  }
  
  object #:: {
    def unapply[A](str: Stream[A]) = str match {
      case Stream.cons(hd, tail) => Some((hd, tail))
      case Stream() => None
    }
  }
  
  def parse(in: Stream[Char]) = state0(in)._1.v
  
  /*
   * 0 $accept: . expr $end
   *
   * '('  shift, and go to state 1
   * '0'  shift, and go to state 2
   * '1'  shift, and go to state 3
   *
   * expr  go to state 4
   * term  go to state 5
   * num   go to state 6
   */
  private def state0(in: Stream[Char]) = in match {
    case cur #:: tail => {
      def loop(tuple: Result): Result = {
        val (res, goto) = tuple
        
        if (goto == 0) {
          loop(res match {
            case NTexpr(v, in) => state4(in, v)
            case NTterm(v, in) => state5(in, v)
            case NTnum(v, in) => state6(in, v)
          })
        } else (res, goto - 1)
      }
      
      loop(cur match {
        case '(' => state1(tail)
        case '0' => state2(tail)
        case '1' => state3(tail)
        case c => throw new ParseException(c)
      })
    }
    
    case Stream() => throw new ParseException
  }
  
  /*
   * 4 term: '(' . expr ')'
   *
   * '('  shift, and go to state 1
   * '0'  shift, and go to state 2
   * '1'  shift, and go to state 3
   *
   * expr  go to state 7
   * term  go to state 5
   * num   go to state 6
   */
  private def state1(in: Stream[Char]): Result = in match {
    case cur #:: tail => {
      def loop(tuple: Result): Result = {
        val (res, goto) = tuple
        
        if (goto == 0) {
          loop(res match {
            case NTexpr(v, in) => state7(in, v)
            case NTterm(v, in) => state5(in, v)
            case NTnum(v, in) => state6(in, v)
          })
        } else (res, goto - 1)
      }
      
      loop(cur match {
        case '(' => state1(tail)
        case '0' => state2(tail)
        case '1' => state3(tail)
        case c => throw new ParseException(c)
      })
    }
    
    case Stream() => throw new ParseException
  }
  
  /*
   * 6 num: '0' .
   *
   * $default  reduce using rule 6 (num)
   */
  private def state2(in: Stream[Char]) = (NTnum(0, in), 0)
  
  /*
   * 7 num: '1' .
   *
   * $default  reduce using rule 7 (num)
   */
  private def state3(in: Stream[Char]) = (NTnum(1, in), 0)
  
  /*
   * 0 $accept: expr . $end
   * 1 expr: expr . '+' term
   * 2     | expr . '-' term
   *
   * $end  shift, and go to state 8
   * '+'   shift, and go to state 9
   * '-'   shift, and go to state 10
   */
  private def state4(in: Stream[Char], arg1: Int): Result = in match {
    case cur #:: tail => {
      decrement(cur match {
        case '+' => state9(tail, arg1)
        case '-' => state10(tail, arg1)
        case c => throw new ParseException(c)
      })
    }
    
    case Stream() => state8(arg1)
  }
  
  /*
   * 3 expr: term .
   *
   * $default  reduce using rule 3 (expr)
   */
  private def state5(in: Stream[Char], arg1: Int) = (NTexpr(arg1, in), 0)
  
  /*
   * 5 term: num .
   *
   * $default  reduce using rule 5 (term)
   */
  private def state6(in: Stream[Char], arg1: Int) = (NTterm(arg1, in), 0)
  
  /*
   * 1 expr: expr . '+' term
   * 2     | expr . '-' term
   * 4 term: '(' expr . ')'
   *
   * '+'  shift, and go to state 9
   * '-'  shift, and go to state 10
   * ')'  shift, and go to state 11
   */
  private def state7(in: Stream[Char], arg1: Int): Result = in match {
    case cur #:: tail => {
      decrement(cur match {
        case '+' => state9(tail, arg1)
        case '-' => state10(tail, arg1)
        case ')' => state11(tail, arg1)
        case c => throw new ParseException(c)
      })
    }
    
    case Stream() => throw new ParseException
  }
  
  /*
   * 0 $accept: expr $end .
   *
   * $default  accept
   */
  private def state8(arg1: Int) = (NTexpr(arg1, Stream()), 1)
  
  /*
   * 1 expr: expr '+' . term
   *
   * '('  shift, and go to state 1
   * '0'  shift, and go to state 2
   * '1'  shift, and go to state 3
   *
   * term  go to state 12
   * num   go to state 6
   */
  private def state9(in: Stream[Char], arg1: Int) = in match {
    case cur #:: tail => {
      def loop(tuple: Result): Result = {
        val (res, goto) = tuple
        
        if (goto == 0) {
          loop(res match {
            case NTterm(v, in) => state12(in, arg1, v)
            case NTnum(v, in) => state6(in, v)
            case _ => throw new AssertionError
          })
        } else (res, goto - 1)
      }
      
      loop(cur match {
        case '(' => state1(tail)
        case '0' => state2(tail)
        case '1' => state3(tail)
        case c => throw new ParseException(c)
      })
    }
    
    case Stream() => throw new ParseException
  }
  
  /*
   * 2 expr: expr '-' . term
   *
   * '('  shift, and go to state 1
   * '0'  shift, and go to state 2
   * '1'  shift, and go to state 3
   *
   * term  go to state 13
   * num   go to state 6
   */
  private def state10(in: Stream[Char], arg1: Int) = in match {
    case cur #:: tail => {
      def loop(tuple: Result): Result = {
        val (res, goto) = tuple
        
        if (goto == 0) {
          loop(res match {
            case NTterm(v, in) => state13(in, arg1, v)
            case NTnum(v, in) => state6(in, v)
            case _ => throw new AssertionError
          })
        } else (res, goto - 1)
      }
      
      loop(cur match {
        case '(' => state1(tail)
        case '0' => state2(tail)
        case '1' => state3(tail)
        case c => throw new ParseException(c)
      })
    }
    
    case Stream() => throw new ParseException
  }
  
  /*
   * 4 term: '(' expr ')' .
   *
   * $default  reduce using rule 4 (term)
   */
  private def state11(in: Stream[Char], arg1: Int) = (NTterm(arg1, in), 2)
  
  /*
   * 1 expr: expr '+' term .
   *
   * $default  reduce using rule 1 (expr)
   */
  private def state12(in: Stream[Char], arg1: Int, arg2: Int) = (NTexpr(arg1 + arg2, in), 2)
  
  /*
   * 2 expr: expr '-' term .
   *
   * $default  reduce using rule 2 (expr)
   */
  private def state13(in: Stream[Char], arg1: Int, arg2: Int) = (NTexpr(arg1 - arg2, in), 2)
  
  private def decrement(tuple: Result) = {
    val (res, goto) = tuple
    assert(goto != 0)
    (res, goto - 1)
  }
}

See Also

External

References

  1. ^ Thomas J Penello (1986). "Very fast LR parsing".
  2. ^ G.H. Roberts (1988). "Recursive ascent: an LR analog to recursive descent".
  3. ^ Kruseman Aretz (1992). "A functional LR parser".
  4. ^ Larry Morell and David Middleton (2003). "Recursive-ascent parsing".