One of the recent performance investigations led me to a piece of code that looked something like
It looks very innocent, but, as the name of the function suggests, there is a default argument involved
and, since luck is rarely on my side, Dummy constructor is not free
Because of this, every with_default
invocation would call this expensive constructor and cause all performance engineers some noticeable grief.
There are many ways to deal with this sadness. The most straightforward way is to use a shared static variable
but since we don’t want to deal with Static Initialization Order Fiasco, we can use a static local variable instead
Both versions would prevent creating a new Dummy
instance for each invocation
and instead reuse a static global
or static local variable
Based on assembly alone, I’d expect both static usages to outperform our initial implementation, potentially with static global variable having a slight edge due, since it does not require an extra if that checks whether default is initialized. It’s finally time to check out assumptions
And the results are
So with_static_default is 1.2x faster than with_default and with_static_local_default is 1.4x faster than with_default and 1.2x faster than with_static_default. I was a bit surprised to see that with_static_local_default was the winner, but since its branch is easy to predict, it probably shouldn’t come as a big surprise.
In any case, the moral of the story is that performance considerations should be applied equally if not more to APIs than to implementation.
Good one. As you said, the CPU will guess most (almost 100%) of the right branching.