We all know that division is the hardest arithmetic operation - remember those long division lessons? It would be sensible to assume that the wider the type - the more time it takes to perform division. And those pesky floats and doubles are probably even slower. But have you ever measured the difference? I haven’t, until today that is.
I hacked a trivial microbenchmark that fills array with numbers of different types and repeatedly divides all of them by 3. I have intentionally not used a power of 2 so that compilers couldn’t cheat with bit shifting.
But before checking out the results, make your guess and try to list these benchmarks in the order from the fastest to slowest. I suspected integral types to significantly outperform floating point types with smaller types taking less time - less work for the CPU, right?
Looks like I was very wrong! Somehow int8_t managed to end up the slowest, but what’s more surprising is that float and double benchmarks were the leaders! Float managed to beat most integral types by ~4X and 10X int8_t. Double is 1.8X slower than float but is still much faster than all integral types.
Does this mean that we should switch from ints to floats and doubles? Not really, but this serves as a good reminder that our intuition is rarely correct when it comes to performance. Most likely, CPU engineers have a set of benchmarks they have to optimize and a set of priorities and it just happens to be that higher FLOPS is what makes the headlines, which may be the reason why poor engineers optimize these operations so much.