With growing popularity of functional programming, its patterns gradually find their way into mainstream imperative languages like C++ and Rust. One of such patterns is recursion. In fact it’s so important for functional programming that languages like Scheme go all the way to requiring all of its implementations to be properly tail-recursive:
3.5 Proper tail recursion
Implementations of Scheme are required to be properly tail-recursive.
It’s very rare for language specifications to mandate required optimizations, so there there better be a good reason for this:
Rationale:
Intuitively, no space is needed for an active tail call because the continuation that is used in the tail call has the same semantics as the continuation passed to the procedure containing the call. Although an improper implementation might use a new continuation in the call, a return to this new continuation would be followed immediately by a return to the continuation passed to the procedure. A properly tail-recursive implementation returns to that continuation directly.
But what about C++? Should we expect tail call optimization? Let’s find out. First, we can take a look at an example of delegation. Suppose we know how to check if given number is even efficiently and want to implement a check for odd numbers. A most natural way to achieve this would be to do something like this:
In other words, a number is even, if it’s not odd. Unfortunately, we get the following assembly for it
which does not look very efficient but should be expected, since is_odd
is not in a tail position as we have to negate its result before returning it to is_even
callers.
So let’s try to implement is_even
using is_odd
in a tail position
In other words, if a number is even, then its predecessor, as well as successor, is odd. Not only produced assembly looks much better
but clang also adds a helpful comment letting us know that it recognized a tail call.
This is great, but if performance of a critical function at stake, how can we make sure that we get the same guarantee as Scheme developers? Unfortunately there is no standard way of ensuring this, but recently Clang has acquired a custom musttail
attribute that can do just that so both
and
can be used to ensure at compile time that tail call optimization is applicable. Also, in case you’re not sure if function call is in a tail position, you can use this attribute to ask the compiler. For instance, if we try to apply the attribute to a function call that is not in a tail position like
Clang will produce a helpful error message
Interestingly, our compilers are so smart that they are able to spot opportunities to rewrite our code to take advantage of a tail call optimization. So both both versions of factorial computation
produce the same highly optimized assembly.
I was very glad to see Scala adopting tailrec annotation which verifies that the method will be compiled with tail call optimization years ago and excited to see musttail
attribute supported by clang. Hopefully it’s coming to your compiler of choice soon as well.