Recently I’ve been going through one of the best and most widely used library for vector similarity search - FAISS. It has a ton of algorithmic and implementation gems, so I never miss a chance to learn something new from it.
As expected, SIMD takes a big role in fast vector processing, but since not all vectors have a SIMD register friendly size, it’s necessary to handle tails with care. FAISS has a handy masked_read function that converts trailing vector elements into a 128 bit value.
Notice an interesting way how switch and case fall-through mechanism is used to dispatch buffer writes. When I saw this, I thought that compilers would use a jump table to replace case comparisons with a single table lookup.
But the generated assembly didn’t look the way I expected
Looks like compiler ignored this clever setup, and effectively rewrote it as a bunch of ifs. But what if we really want to use a jump table? I always wanted to try labels and values technique, which is not part of C++ standard, but both Clang and GCC happily accept it.
What? Gotos? Aren’t they considered harmful? Not if we know what we’re doing ;)
And we finally get the expected assembly
Surely it’s going to be faster, right? It’s benchmark time!
And now it’s time for the second disappointment
Looks like a simple iffy assembly compilers generate is faster! Note, that I’ve added an inlined version because no matter how hard I tried I couldn’t force compilers to inline goto version - not even always_inline attribute helped. Manually inlined version obviously performed much better than non-inlined one, but it’s still slower.
It was really nice to play with labels as values technique and fight the compiler, but as is the case most of the time - such fights should usually be reserved for fun and exploration - not for production code.
Nice! Forgot computed goto's in C existed. I suppose I banned them from my brain, as they are non-standard. I'm curious what happens when jumping to labels across other functions? BTW long jumps (longjmp) are still useful to me, such as to implement exceptions in an interpreter of a functional language to exit deep calls. But that's another story.
interesting! thanks.