Listening to LLVM.

Or how to not miss optimizations without digging into assembly.

Jun 20, 2021

I absolutely love exploring generated assembly, learning fascinating approaches compilers use to make the most out of our hardware and trying to spot anything they or I have missed. But this is not a very scalable approach - there is way too much assembly to review and every compiler upgrade or code change can invalidate our assumptions and drastically change produced machine code.

So what would be a more scalable approach? Go is famous for its well thought-out toolchain that among other things provides hints about applied and missed optimizations that include bounds check elimination, escape analysis and many others, but C++ remains the leader when it comes to high-performance, so it would be great to have the same level of toolchain support. Fortunately LLVM has a number of compiler remarks that can tell about applied and missed optimizations, sometimes supported by useful explanations.

For example, in one of the previous posts we’ve looked at how LLVM is able to transform natural number summation with a closed formula, but as it relies on undefined behavior, it would be great to have a way to verify if clang was able to apply it. Turns out that it’s pretty easy - adding -Rpass=.* flag when compiling

        
int sum_of_n(int n) {
    int total = 0;
    for (int i = 1; i <= n; ++i) {
        total += i;
    }
    return total;
}

view raw sum_of_n.cpp hosted with ❤ by GitHub

asks LLVM to emit remarks about successful optimizations

This is reassuring and can be easily integrated with continuous integration infrastructure to prevent any regressions.

Using the same flag, we can get information about vectorization - for the below snippet

        
void add(int * A, int * B, int n) {
    for (int i = 0; i < n; ++i) {
        A[i] += B[i];
    }
}

view raw array_add_vectorized.cpp hosted with ❤ by GitHub

we get the expected

Unfortunately things don’t always go as planned, in some optimizations cannot be applied. For example, LLVM is unable to vectorize the snippet below

        
void process(int * A, int n) {
    for (int i = 0; i < n; ++i) {
        switch(A[i]) {
        case 0: A[i] = i*2; break;
        case 1: A[i] = i;   break;
        default: A[i] = 0;
        }
    }
}

view raw not_vectrorized_array_processing.cpp hosted with ❤ by GitHub

and all we need to get this confirmed is to pass -Rpass-missed=.* flag:

It’s useful to know that we couldn’t vectorize this snippet, but it would be even better if we knew why. -Rpass-analysis=.* flag to the rescue:

All of the flags above: -Rpass, -Rpass-missed and -Rpass-analysis support regular expression syntax for defining passes they should be applied to, so .* asks LLVM to emit remarks for all passes, which can be overwhelming and unnecessary for large code bases. Conveniently, all above remarks include a specific flag that would trigger remarks for the pass that have emitted them, so instead of -Rpass-analysis=.* it’s possible to be more specific and enable analysis remarks just for the loop-vectorize pass using -Rpass-analysis=loop-vectorize flag.

Since inlining enables a number of other optimizations, it’s extremely useful to know whether specific functions were inlined and inline pass kindly provides this information. It’s sufficient to pass -Rpass=inline flag when compiling

        
int get_x(int x) {
    return x;
}

int use_get_x() {
    return get_x(1);
}

view raw inlined_get_x.cpp hosted with ❤ by GitHub

to get

I highly recommend exploring LLVM remarks - it’s a user-friendly view into compiler optimization strategies and a great way to perform optimization regression testing.

Software Bits Newsletter

Listening to LLVM.

Or how to not miss optimizations without digging into assembly.