I've noticed when comparing assembly outputs, that despite advances in compiler optimizations, they still dont optimize dependencies correctly. Register dependencies can really stall the multi stage pipelining so it is well worth hand optimizing these in speed critical sections. (a certain ray tracer springs to mind :p )
It is nice to see that compilers do use xor register with itself to set it to zero.
Before everyone comments that the optimization settings do work please note that I've only used one popular c compiler and have not tried the Intel one recently. I have also not looked at every possible optimization (only used /O2 and /O3 flags) - the dependency one was just immediately obvious when looking at the code.