In one of the previous articles on function call overhead we’ve discussed Go’s excessive function call overhead due to its way of passing arguments via stack. In case you’d like to refresh your memories you can check the original article below
TL;DR: Using stack involves additional CPU instructions which results in binary size increase and runtime overhead, since for a trivial function like
instead of
Go compiler before version 1.17 would generate
Fortunately, by leveraging optimizing compilers we stand of the shoulders of giants and as time goes by we benefit from new optimizations. In a recently released Go 1.17 one of these optimizations include transition from passing function arguments using stack to a register-based approach used in C++, Rust and other high-performance languages. So let’s see how it affects our benchmarks
% go tool compile -o call_overhead_test call_overhead_test.go
% go tool objdump call_overhead_test
TEXT "".MultNoInline(SB) gofile../testing/call_overhead_test.go
call_overhead_test.go:10 0x1062 480fafc3 IMULQ BX, AX
call_overhead_test.go:10 0x1066 c3 RET
TEXT "".MultInline(SB) gofile../call_overhead_test.go
call_overhead_test.go:14 0x1067 480fafc3 IMULQ BX, AX
call_overhead_test.go:14 0x106b c3 RET
...
% go test -bench=BenchmarkMult
Compiling module...
Instantiating module...
goos: darwin
goarch: amd64
pkg: example.com/testing
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkMultNoInline-16 603527020 1.981 ns/op
BenchmarkMultInline-16 562753930 2.134 ns/op
Looks like the new argument passing approach is even slightly faster than an inlined version instead of being ~30% slower as was the case in the previous version of the Go compiler. And the best part is that to enjoy these benefits all we need to do is to upgrade the compiler and recompile the code.
very interesting. thanks for sharing!