Standing on the shoulders of giants.

Or revisiting Go's function call overhead.

In one of the previous articles on function call overhead we’ve discussed Go’s excessive function call overhead due to its way of passing arguments via stack. In case you’d like to refresh your memories you can check the original article below

Software Bits Newsletter
Function call overhead.
Function inlining is a very powerful optimization with advantages that include enabling other optimizations and removal of function call overhead but may also result in code bloat that hurts instruction cache efficiency. But today we’re going to discuss why it’s extremely important in Go…
Read more

TL;DR: Using stack involves additional CPU instructions which results in binary size increase and runtime overhead, since for a trivial function like

instead of

Go compiler before version 1.17 would generate

Fortunately, by leveraging optimizing compilers we stand of the shoulders of giants and as time goes by we benefit from new optimizations. In a recently released Go 1.17 one of these optimizations include transition from passing function arguments using stack to a register-based approach used in C++, Rust and other high-performance languages. So let’s see how it affects our benchmarks

% go tool compile -o call_overhead_test call_overhead_test.go
% go tool objdump call_overhead_test
TEXT "".MultNoInline(SB) gofile../testing/call_overhead_test.go
  call_overhead_test.go:10      0x1062                  480fafc3                IMULQ BX, AX
  call_overhead_test.go:10      0x1066                  c3                      RET

TEXT "".MultInline(SB) gofile../call_overhead_test.go
  call_overhead_test.go:14      0x1067                  480fafc3                IMULQ BX, AX
  call_overhead_test.go:14      0x106b                  c3                      RET
...
% go test -bench=BenchmarkMult
Compiling module...
Instantiating module...
goos: darwin
goarch: amd64
pkg: example.com/testing
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkMultNoInline-16        603527020                1.981 ns/op
BenchmarkMultInline-16          562753930                2.134 ns/op

Looks like the new argument passing approach is even slightly faster than an inlined version instead of being ~30% slower as was the case in the previous version of the Go compiler. And the best part is that to enjoy these benefits all we need to do is to upgrade the compiler and recompile the code.