We’ve discussed performance advantages of using Trie over std::unordered_set in
and some of the readers raised a valid concern about its memory consumption, especially in case of large alphabets, e.g. unicode. I’ve briefly mentioned some of the common approaches that are used to address these concerns, but in this article I wanted to provide more details about possibly the easiest one to implement - ternary search trie.
So the problem we are trying to address is the amount of memory we waste on each trie node that has only few children
A ternary search trie (TST) is a variation of a trie that uses three pointers instead of one for each child node. This makes ternary search trees more space-efficient than regular tries, but it also makes them slower for some operations.
Each node in a TST represents a single character in a string. The three pointers for each child node point to nodes that represent the next characters in the string, in the following order:
The left pointer points to a node that represents the next character in the string if the character is less than the character at the current node.
The equal pointer points to a node that represents the next character in the string if the character is equal to the character at the current node.
The right pointer points to a node that represents the next character in the string if the character is greater than the character at the current node.
Above implementation is recursive and is focused more on ease of understanding instead of performance, but just for fun let’s see how it compares to std::unordered_set in terms of insertion runtime
And looks like even our naive TST implementation manages to beat std::unordered_set
Don’t forget that for some characters TST nodes serve a routing role, which means that the traversal path can be 2X longer than the path in a regular trie. As such before deciding which implementation to use it would be a good idea to benchmark its insert and search runtime using the data representable of your production workloads.
How about a bloom filter?