I absolutely love exploring generated assembly, learning fascinating approaches compilers use to make the most out of our hardware and trying to spot anything they or I have missed. But this is not a very scalable approach - there is way too much assembly to review and every compiler upgrade or code change can invalidate our assumptions and drastically change produced machine code.
So what would be a more scalable approach? Go is famous for its well thought-out toolchain that among other things provides hints about applied and missed optimizations that include bounds check elimination, escape analysis and many others, but C++ remains the leader when it comes to high-performance, so it would be great to have the same level of toolchain support. Fortunately LLVM has a number of compiler remarks that can tell about applied and missed optimizations, sometimes supported by useful explanations.
For example, in one of the previous posts we’ve looked at how LLVM is able to transform natural number summation with a closed formula, but as it relies on undefined behavior, it would be great to have a way to verify if clang was able to apply it. Turns out that it’s pretty easy - adding -Rpass=.*
flag when compiling
int sum_of_n(int n) { | |
int total = 0; | |
for (int i = 1; i <= n; ++i) { | |
total += i; | |
} | |
return total; | |
} |
asks LLVM to emit remarks about successful optimizations
This is reassuring and can be easily integrated with continuous integration infrastructure to prevent any regressions.
Using the same flag, we can get information about vectorization - for the below snippet
void add(int * A, int * B, int n) { | |
for (int i = 0; i < n; ++i) { | |
A[i] += B[i]; | |
} | |
} |
we get the expected
Unfortunately things don’t always go as planned, in some optimizations cannot be applied. For example, LLVM is unable to vectorize the snippet below
void process(int * A, int n) { | |
for (int i = 0; i < n; ++i) { | |
switch(A[i]) { | |
case 0: A[i] = i*2; break; | |
case 1: A[i] = i; break; | |
default: A[i] = 0; | |
} | |
} | |
} |
and all we need to get this confirmed is to pass -Rpass-missed=.*
flag:
It’s useful to know that we couldn’t vectorize this snippet, but it would be even better if we knew why. -Rpass-analysis=.*
flag to the rescue:
All of the flags above: -Rpass
, -Rpass-missed
and -Rpass-analysis
support regular expression syntax for defining passes they should be applied to, so .*
asks LLVM to emit remarks for all passes, which can be overwhelming and unnecessary for large code bases. Conveniently, all above remarks include a specific flag that would trigger remarks for the pass that have emitted them, so instead of -Rpass-analysis=.*
it’s possible to be more specific and enable analysis remarks just for the loop-vectorize pass using -Rpass-analysis=loop-vectorize
flag.
Since inlining enables a number of other optimizations, it’s extremely useful to know whether specific functions were inlined and inline
pass kindly provides this information. It’s sufficient to pass -Rpass=inline
flag when compiling
int get_x(int x) { | |
return x; | |
} | |
int use_get_x() { | |
return get_x(1); | |
} |
to get
I highly recommend exploring LLVM remarks - it’s a user-friendly view into compiler optimization strategies and a great way to perform optimization regression testing.
Very useful, thanks :)
very nice tip! thanks!