With decades of research and development optimizing compilers have reached a point that I rarely have to think about whether particular optimization is supported. It’s certainly the case with compilers like Clang and GCC but programming languages like Go prioritize compilation speed over performance and as such may not meet certain expectations.
One of the basic compiler optimizations is hoisting that moves loop-invariant code out of the loops. For example, in
x * y * z
is an invariant that does not change from iteration to iteration, so it would be reasonable to expect the compiler to rewrite it as
But a quick glance at the generated assembly
reveals that multiplication is performed at each loop iteration which is very different from the HoistedMult
version
where multiplication is performed only once.
Now that we are done with theory, let’s find out if this matters in practice and use the following benchmark
and, unfortunately, the results below demonstrate almost 40% performance regression
Bonus: Rust is known for being performance focused, so as expected
not only benefits from hoisting, also leverages vectorization
proving again its well-deserved reputation.