Faster matching name search in rustc.

Or using cheaper ways to maintain relations.

Sep 23, 2023

A mathematical relation is a relationship between two sets of numbers or elements. Relations are often expressed using ordered pairs (x, y). A function is a special type of relation where each x value is related to only one y value.

Relations and Functions - Definition, Types, and Examples

So when we write functions, we establish relationship between its parameters and its return values. In mathematics xs and ys are just symbols and functions do not have cost. In software it’s a different story - values take memory and function execution uses CPU cycles, so performance work is a search for cheaper way to establish relations.

That’s why we can use squares of distances between points instead of distances to avoid computing square roots. But today we’re going to use a different example. Rust is known for its developer friendly error messages, so we are going to look at

used for finding best suggestion candidates.

Note, that sort_by_words returns an owned String that is a formed by joining sorted parts of the original name split by underscore. Thanks to Rust’s lifetimes split returns slices of the name without new allocations, but, unfortunately, .join(“_”) cannot return a slice into name and as such has to create a new string.

But wait a second - do we actually need this joined string? Think about what kind of relation sort_by_words(name1) == sort_by_words(name2) establishes between name1 and name2. Is it any different than if we were to return split_words vector intead of joined string? It’s fairly obvious that for all strings name1 and name2, if vec![n11,n12,..] == vec![n21,n22,…] then “${n11}_${n12}_…}” == “${n21}_${n22}_…}” and the other way around. This means that we can rewrite sort_by_words as follows

and get the same result without unnecessary string allocations.

It’s time to see how much difference does it make using

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	use criterion::{criterion_group, criterion_main, Criterion};

	fn sort_by_words1(name: &str) -> String {
	let mut split_words: Vec<&str> = name.split('_').collect();
	// We are sorting primitive &strs and can use unstable sort here.
	split_words.sort_unstable();
	split_words.join("_")
	}

	fn sort_by_words2(name: &str) -> Vec<&str> {
	let mut split_words: Vec<&str> = name.split('_').collect();
	// We are sorting primitive &strs and can use unstable sort here.
	split_words.sort_unstable();
	split_words
	}

	fn bench_sorts(c: &mut Criterion) {
	let mut group = c.benchmark_group("multiply add");
	let name = "some_fancy_name";
	group.bench_function("original", \|b\| b.iter(\|\| sort_by_words1(name)));
	group.bench_function("proposed", \|b\| b.iter(\|\| sort_by_words2(name)));
	group.finish();
	}

	criterion_group!(benches, bench_sorts);
	criterion_main!(benches);

view raw sort_by_words_benchmark.rs hosted with ❤ by GitHub

and the results on M1 MacBook Air are

     Running benches/sort_by_words_benchmark.rs (target/release/deps/sort_by_words_benchmark-5adee33fa37ce7e2)
Benchmarking multiply add/original: Collecting 100 samples in estimated 5.0006 s (27M iteratio
multiply add/original   time:   [172.85 ns 177.30 ns 182.83 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking multiply add/proposed: Collecting 100 samples in estimated 5.0000 s (61M iteratio
multiply add/proposed   time:   [81.106 ns 81.605 ns 82.261 ns]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

In summary we’re getting a 2X+ faster code without any downsides. That’s the power of relation modeling in action. PR has already landed in Rust and may already be processing your compiler errors.

Software Bits Newsletter

Discussion about this post