Trie vs std::unordered_set.

Or thinking outside of standard library.

Apr 23, 2023

Imagine you have a bunch of registered users and your job is to quickly check if a particular user is in the list. Most, when faced with this question during a coding interview, would immediately suggest using hashtable, known as std::unordered_set in C++. And, while it’s not incorrect, our candidate hasn’t asked anything about the input and that’s already a negative signal. What if usernames contain only lowercase latin letters? Now that you know this, would you change your mind?

One of my favorite data structures is a trie - it’s trivial to implement even in C without any external dependencies. A quick-and-dirty version can look something like this

Surely such a simple data structure cannot compete with fancy std::unordered_set? Let’s see. We’ll benchmark insert and search of a word “hello” in data structures that don’t and do contain this word.

And the results are fairly impressive

As you can see, our naive trie leaves std::unordered_set in the dust with 440X faster lookups.

This solid foundation is also easy to extend. For example, if we had to count the number of occurrences, we can replace boolean is_end_of_word with a counter. We can also store a value instead of counter to turn it into a map. It’s also easy to support more complex strings by extending the number of nodes or use std::vector<std::pair<char, TrieNode*>> instead of std::array to support larger alphabet.

Data structures in standard library are useful, but they are also generic and don’t take your domain into account. Because of this they are less efficient and are rarely a good way to model business domain.

Software Bits Newsletter

Discussion about this post

Ready for more?