When it comes to parallel processing, mutability and sharing do not play well together, but when parallelism or mutations are not required, sharing data structures instead of copying them can be an easy way to reduce memory usage and improve performance.
To make things concrete, let’s solve a simple problem of converting text consisting of words separated by spaces to a form where all of its words are sorted lexicographically. The first thing that comes to mind is to:
split text into words
sort words
join words into a text
To split the text into words, we can use a simple linear scan:
that returns a Vec
of String
s corresponding to words. Since correctness is not the topic of this discussion, the following simple tests are enough to get the idea about what to expect:
Now that we have the words, it’s time to sort and join them to get the desired result:
as following tests show:
Now that the setup is ready, it’s time to run the benchmarks to get an idea about how efficient our implementation is
The numbers don’t look too bad
but do we really need to copy all words into a String
s? If we had to modify them, then we’d have no choice, but this problem does not require any mutations, so can we do better? Yes, we can! Rust conveniently offers a string slice (&str
) type to represent a view into a string, basically a pointer to a data start and its length, instead of copying its content.
Let’s implement the same logic as above using string slices:
Even though the logic is identical, we’ll use the same set of tests to make sure we haven’t missed anything obvious
We’ll also use the same set of benchmarks to compare how this new version performs relative to the original implementation
And results are demonstrating ~3X improvement
Impressive win given such a trivial change.
We don’t have to sort text words very frequently but this technique would be very handy for lexing and many other string processing algorithms and Rust’s lifetimes make it particularly easy and safe to use views, unlike C++’s string_view
that can easily lead to unpleasant surprises.
Full source is available on gist:
#![feature(test)] | |
extern crate test; | |
fn to_word_strings(text: &str) -> Vec<String> { | |
let mut start = 0; | |
let mut strings = Vec::new(); | |
for (idx, ch) in text.char_indices() { | |
if ch.is_whitespace() { | |
if idx - start > 0 { | |
strings.push(text[start..idx].to_owned()); | |
} | |
start = idx + 1; | |
} | |
} | |
if text.len() - start > 0 { | |
strings.push(text[start..text.len()].to_owned()); | |
} | |
strings | |
} | |
fn to_word_slices(text: &str) -> Vec<&str> { | |
let mut start = 0; | |
let mut slices = Vec::new(); | |
for (idx, ch) in text.char_indices() { | |
if ch.is_whitespace() { | |
if idx - start > 0 { | |
slices.push(&text[start..idx]); | |
} | |
start = idx + 1; | |
} | |
} | |
if text.len() - start > 0 { | |
slices.push(&text[start..text.len()]); | |
} | |
slices | |
} | |
fn sort_word_slices(text: &str) -> String { | |
let mut words = to_word_slices(text); | |
words.sort_unstable(); | |
words.join(" ") | |
} | |
fn sort_word_strings(text: &str) -> String { | |
let mut words = to_word_strings(text); | |
words.sort_unstable(); | |
words.join(" ") | |
} | |
#[cfg(test)] | |
mod tests { | |
use super::*; | |
use test::Bencher; | |
use rand::prelude::*; | |
#[test] | |
fn test_to_word_slices() { | |
assert_eq!(to_word_slices("hello world"), vec!["hello", "world"]); | |
} | |
#[test] | |
fn test_to_word_strings() { | |
assert_eq!( | |
to_word_strings("hello world"), | |
vec!["hello".to_owned(), "world".to_owned()] | |
); | |
} | |
#[test] | |
fn test_sort_word_slices() { | |
assert_eq!(sort_word_slices("world hello"), "hello world"); | |
assert_eq!(sort_word_slices(" world hello "), "hello world"); | |
} | |
#[test] | |
fn test_sort_word_strings() { | |
assert_eq!(sort_word_strings("world hello"), "hello world"); | |
assert_eq!(sort_word_strings(" world hello "), "hello world"); | |
} | |
#[bench] | |
fn bench_to_word_slices(bench: &mut Bencher) { | |
bench.iter(|| to_word_slices("hello world used for benchmarking")); | |
} | |
#[bench] | |
fn bench_to_word_strings(bench: &mut Bencher) { | |
bench.iter(|| to_word_strings("hello world used for benchmarking")); | |
} | |
#[bench] | |
fn bench_sort_word_slices(bench: &mut Bencher) { | |
bench.iter(|| sort_word_slices("hello world used for benchmarking")); | |
} | |
#[bench] | |
fn bench_sort_word_strings(bench: &mut Bencher) { | |
bench.iter(|| sort_word_strings("hello world used for benchmarking")); | |
} | |
} |
Don’t forget to subscribe to not miss the next article and let me know what would you like to learn more about.