It’s very likely that searching for something is the most common task we have to program. We search for words in a document, values in associative dictionaries and so much more. C++ greatly simplifies our lives by providing all sorts of algorithms and data structures that implement these lookups. Even though built-in standard library dictionaries are not most efficient, one thing always bothered me more than anything else - lookup key types had to match exactly collection value types, or be convertible. While it seems to make sense, it causes some grief even in the most common cases when dictionaries with string keys are used.1
Say, we maintain word counts using an unordered map
In the end we’d like to query specific word, so why not use a literal?
And it works, but we are paying a cost of implicit string construction, which is probably not something we want. And besides, why does it matter that we are passing the same type? Doesn’t unordered map use hash code to locate the bucket and equality operator to make sure that keys match? If so, shouldn’t we be able to look up any type as long as it has compatible hash function and equality comparator? Yes and that’s what transparent comparators enable us to do.
All we need to do is to write a struct with all necessary hash operators and a special is_transparent type alias.
And parameterize our unordered_map using it and equal_to
These template parameters makes tell unordered_map how to compute hash codes and compare keys for equality.
Ok, now that the setup is complete, it’s time to see if it makes any difference.
And looks like it does! Querying using const char* is ~5.8X faster than using non-transparent lookups and even string queries are 1.1X faster than respective non-transparent lookups.
Actual performance difference depends on specific workload and types, so as always, measure first, but even if performance is not a factor in your decision making process, consider using this approach for better API ergonomics.
Disclaimer: I really really don’t like strings and believe that they are severely overused.
Thanks for posting. This is something that we all need to keep in mind.
It was added to 14 for the ordered containers (map, set) and was added to the unordered ones in 20.
Strings are severely what? :)