Optimizing compiler

There is a large body of work associated with the topic of compiler optimization, mostly consisting of a wide variety of individual techniques.

Techniques can be (roughly) divided into two catagories, local and global. Local techniques tend to be easier to implement, but result in lesser gains, while global techniques make the opposite tradeoff.

A local technique commonly used on some older processors is to XOR a register by itself, rather than copy the value 0 into it. It is up to the compiler to know whether or not that is a faster way of getting 0 into a register.

Some optimization techniques are:

loop unrolling: Attempts to reduce the overhead inherent in testing a loop's condition, by determining at compile time the number of iterations the loop will go through, and emitting duplicate code, a number of times equal to the iteration count. Obviously, it can only be used if the iteration count can be determined at compile time.
loop combining: Another technique which attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times (whether or not that number is known at compile time), their body's can be combined as long as they make no reference to each other's data.
loop interchange (swapping inner and outer loops)
common subexpression elimination: In the expression "(a+b)-(a+b)/4", "common subexpression" refers to the duplicated "(a+b)". Compilers implementing this technique realize that "(a+b)" won't change, and as such, only calculate its value once.
test reordering
inlining of procedures
constant folding and propagation
instruction scheduling
dead code elimination: "Dead code" refers to code segments which can never be executed. Dead code elimination prevents the compiler from emitting code for such segments, saving on CPU instruction cache.
using CPU instructions with complex offsets to do math
code-block reordering to reduce conditional branches and improve locality of reference
factoring out of invariants (moving invariant expressions out of loops, e.g.)
sentinels
removing recursion
strength reduction
reduction of cache collisions (e.g. by disrupting alignment within a page)

please expand this list, and add references to papers/books discussing each technique.