Optimizing compiler
Appearance
There is a large body of work associated with the topic of compiler optimization, mostly consisting of a wide variety of individual techniques.
Techniques can be (roughly) divided into two catagories, local and global. Local techniques tend to be easier to implement, but result in lesser gains, while global techniques make the opposite tradeoff.
A local technique commonly used on some older processors is to XOR a register by itself, rather than copy the value 0 into it. It is up to the compiler to know whether or not that is a faster way of getting 0 into a register.
Some optimization techniques are:
- loop unrolling
- Attempts to reduce the overhead inherent in testing a loop's condition, by determining at compile time the number of iterations the loop will go through, and emitting duplicate code, a number of times equal to the iteration count. Obviously, it can only be used if the iteration count can be determined at compile time.
- loop combining
- Another technique which attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times (whether or not that number is known at compile time), their body's can be combined as long as they make no reference to each other's data.
- loop interchange (swapping inner and outer loops)
- common subexpression elimination
- In the expression "(a+b)-(a+b)/4", "common subexpression" refers to the duplicated "(a+b)". Compilers implementing this technique realize that "(a+b)" won't change, and as such, only calculate its value once.
- test reordering
- inlining of procedures
- constant folding and propagation
- instruction scheduling
- dead code elimination
- "Dead code" refers to code segments which can never be executed. Dead code elimination prevents the compiler from emitting code for such segments, saving on CPU instruction cache.
- using CPU instructions with complex offsets to do math
- code-block reordering to reduce conditional branches and improve locality of reference
- factoring out of invariants (moving invariant expressions out of loops, e.g.)
- sentinels
- removing recursion
- strength reduction
- reduction of cache collisions (e.g. by disrupting alignment within a page)
please expand this list, and add references to papers/books discussing each technique.