Large companies like Facebook and Google often maintain their own forks of C++ standard libraries and while there are many benefits of optimizing libraries for common internal use-cases, for Facebook, originally one of the major reasons was ability to perform small/short string optimization.
The fundamental idea is fairly simple - stuff as much data as possible into the data structure itself to avoid storing it on the heap. The most straightforward way to implement a string is to either use a vector of characters, which delegates memory management to vector implementation and exactly what Rust does,
or control all aspects manually by maintaining size, capacity and a pointer to an array of characters on the heap
In the case of using vector, we don’t have much wiggle room in terms of optimizations, since we totally rely on wisdom of vector creators, whereas manual version, while being messier to deal with, gives us total freedom to experiment.
First, let’s dedicate the least significant bit of the capacity field (__cap__
) to indicate the mode with 1 indicating a “normal” string layout
and 0 indicating a short string layout
In this short-string mode, we can reuse most of the space dedicated to a string data structure for storing characters inline. The easiest way to achieve this is via unions.
Because of this, the number of characters we can store inline depends on the sizes of capacity, size and pointer, which may be platform-dependent.
But we are performance practitioners, so we won’t be relying on implementation details and instead benchmark string creation of different sizes to see how much difference does it make in practice. The get the data we’ll use
and Quick C++ Benchmark. And the results are
and they clearly show that GNU’s libstdc++ implementation delivers ~4.5X better performance for strings up to 15 characters, while LLVM’s libc++ being much closer to the implementation presented above, have 24 byte wide strings, since capacity, size and a pointer are all using 8 bytes) with 1 byte dedicated to a mode and 1 used as a string terminator, can store strings up to 22 characters long inline. Benchmark supports this hypothesis
Unfortunately these optimizations are not available in most programming languages, including Rust, even though they can be incredibly valuable for lots of different data structures. In fact LLVM project employs a number of Small*
data structures including:
so its common for your data structures to be small, consider taking advantage of small object optimization after measuring its impact on performance.