Using profile to improve performance.

Or data driven performance optimizations.

Jul 03, 2021

There are a number of compiler optimizations that depend on likelihood of a particular statement being executed. Some of them reduce number of page faults encountered while loading code, some reduce branch predictor misses and some reorder computations in if conditions so that most likely ones are executed first.

This method of improving performance is so common, that compilers provide a way to hint likelihood via attributes [[likely]] and [[unlikely]]. To see how they work in practice let’s take a look at a simple example below:

Its compiled version matches the source code:

which means that performance of the first branch will be better due to CPU pipelining. But what if we know that the first branch is actually not likely in practice? Sure, we can always rewrite our source to swap the branches, but even though for this trivial example it would be easy to do, in practice there may be risks and complexity involved in such refactoring. That’s where compiler hint attributes come into picture - by adding [[unlikely]] to the first branch

we let the compiler know that it should optimize binary code for the else clause

Great, but there is a problem - even if our assumptions about likelihood are correct at the moment, they can change with time and it’s unlikely, no pun intended, that someone would remember to update all attributes to reflect these changes.

Fortunately, there is a solution that is based on profile information and does not require much effort:

first we need to compile the instrumented binary that will be used to capture profile information: clang -std=c++20 hello.cpp deps.cpp -fprofile-generate
run the binary using the load that is as close to production-like as possible: ./a.out. This will produce a profile file like default_15822701540018083322_0.profraw
merge all profile files into a single one: /Library/Developer/CommandLineTools/usr/bin/llvm-profdata merge -output hello.profdata default_15822701540018083322_0.profraw
if interested, you can check out the report about the captured profile: /Library/Developer/CommandLineTools/usr/bin/llvm-profdata show -all-functions -counts -ic-targets hello.profdata
compile final binary using generated profile: clang -std=c++20 -O3 hello.cpp deps.cpp -o hello -fprofile-use=hello.profdata

This results in a following optimized assembly:

Note that, when profile guided optimization is used, it takes precedence over manually inserted compiler hint attributes, so our [[unlikely]] attribute on the f1 branch was ignored and it was still chosen as a preferred branch based on profile.

This approach is easy to onboard on the CI to continuously regenerate profile information and use it to produce highly optimized release binaries. As such there is no excuse not to use this powerful and useful approach.

Software Bits Newsletter

Discussion about this post