Function inlining is a very powerful optimization with advantages that include enabling other optimizations and removal of function call overhead but may also result in code bloat that hurts instruction cache efficiency. But today we’re going to discuss why it’s extremely important in Go.
To make the discussion concrete, let’s take a look at the trivial multiplication function
Its first 2 assembly instructions
read move x
and y
into registers AX
and CX
, which looks strange to everyone who is used to most native programming languages like C/C++ or Rust, where code like
is translated into assembly
that does not contain memory reads and leverages System V ABI where parameters to functions are passed in the registers rdi
, rsi
, rdx
, rcx
, r8
, r9
, and further values are passed on the stack in reverse order.
Rust also uses System V ABI, so
is translated into
Unfortunately, Go does not use System V ABI and passes parameters via stack, although there is an open proposal to switch to a register-based calling convention.
But does it really matter? How much overhead can we expect? Let’s find out.
For benchmarks we’ll use
Note that MultNoInline
function has a compiler directive to disable inlining, unlike MultInline
which will be inlined by the compiler. We’ll verify our expectations as soon as we use them in the benchmarks
And sure enough BenchmarkMultNoInline
has a MultNoInline
function call as well as move instructions to prepare the parameters
whereas BenchmarkMultInline
contains multiplication instruction directly
So what are the results?
So disabling inlining results in ~30% performance regression for such a trivial case, which is not going to be as bad for more complex functions but should be considered when using Go assembly calls, which are never inlined.
In conclusion, consider function call overhead for functions in hot loops, especially when using Go assembly.