Last time we looked at the way arguments are passed in Go and how it impacts performance. We’ve also noted that C/C++ uses a different argument passing convention that’s trying to leverage CPU registers instead of stack. This approach is much more efficient but it’s still not free and it’s time to see the impact it has on performance.
For benchmarks we’ll use a integer multiplication functions with inlining enabled and disabled:
We’ll also be using one of the best C++ benchmark frameworks - Google Benchmark:
Before we run the benchmark, let’s make sure that it’s going to run the expected assembly and sure enough we have a function call for the non-inlined version
and direct multiplication instruction for the inlined version
Now that we have proven that our setup is correct it’s finally time to get the numbers. Setting up a C++ benchmark requires some amount of boilerplate, so recently I have been pleased to discover quick-bench.com that takes care of all the boilerplate and runs benchmarks in the cloud with nice comparison chart:
The benchmark claims that non-inlined version is 5 times slower and as such inlining is an important factor to consider when looking for performance optimizations.
Full benchmark and its source can be found on quick-bench.com.
Really great articles. Unfortunately l am living in embedded world. If you are able, please try to take into account different CPU architectures as well (short pipelines changes everything :)