Sethi–Ullman algorithm
When generating code for arithemic expressions, the compiler has to decide which is the best way to translate the expression in terms of number of instructions used as well as number of registers needed to evaluate a certain subtree (especially if free registers are scarce). The so called Sethi-Ullman algorithm fulfills the property of producing code which needs the least number of instructions possible as well as the least number of storage references (under the assumption that at the most commutativity and associativity apply to the operators used, but laws like a * b + a * c = a * (b + c) do not hold). Please note that the algorithm succeeds as well if neither commutativity nor associativity hold for the expressions used, and therefore arithmetic transformations can not be applied.
The simple Sethi-Ullman algorithm works as follows (for a load-store architecture): for every non-constant leaf in the abstract syntax tree, it assigns a 1 (i.e. 1 register is needed to hold the variable/field/etc.). It then goes up in the tree, assigning the number of registers needed to evaluate the respective subtrees of a node. For every node n, the algorithm checks how many registers are needed by the node or leaf below. If the number of registers needed in the left subtree (l) are not equal to the number of registers needed in the right subtree (r), the number of registers needed for the current node is max(l, r). If l==r, then the number of registers needed for the current node is l+1.
How code should be emitted: if the number of registers needed to compute the left subtree is bigger than the number of registers for the right subtree, then the left subtree is evaluated first (since it may be possible that the one more register needed by the right subtree makes the left subtree spill). If the right subtree needs more registers than the left subtree, the right subtree is evaluated first accordingly. If both subtrees need equal much registers, then the order of evaluation is irrelevant.
In an advanced version of the Sethi-Ullman algorithm, the arithmetic expressions are first transformed, exploiting the algebraic properties of the operators used.
External links
- The Generation of Optimal Code for Arithmetic Expressions Ravi Sethi, J. D. Ullman, Journal of the ACM, Vol. 17, No. 4, October 1970, pp. 715-728
- Code Generation for Trees