Fast has_hex_prefix check.

Or SIMD using uint16_t.

Sep 30, 2023

Vectorized programming is a programming technique that uses vector instructions to perform operations on multiple data points at the same time. This can significantly improve the performance of programs that work with large amounts of data. Vector instructions are available on most modern CPUs. They allow the CPU to perform multiple operations on a single data stream in parallel. This can be much faster than performing the operations one at a time.

When it comes to SIMD, we usually think about SSE*, AVX and Neon, but vectorization has been used for years before any of them appeared. Let’s take a look at a common parsing subroutine to check if string has a hex prefix ‘0x’. The first thing that comes to mind is something like

which, by using a trick for fast lowercase transformation, can be reduced to

Clang generates the same assembly for both versions

It may seem like there is nothing else to be done here. But we are dealing with just 2 single byte characters here, so why do we have to process them separately? What if we convert first 2 chars into a 16bit integer so that they could be processed as a single unit? And that’s exactly what ada library for URL parsing does

Since it’s a generic library, it has to take into account endianness so that the second character is converted to lowercase using `| 0x20` trick we’ve used above. Also, it’s a UB to reinterpret_cast string bytes into uint16_t, so std::memcpy is has to be used until C++20’s std::bit_cast can be used.

Surely this implementation would result in a lot of assembly, right?

Turns out that optimizing compilers like clang easily get rid of memcpys and in the end everything boils down to and, cmp and sete.

In this article we had to process just 2 chars, but even without going all the way to something like AVX512, we can easily process 8 chars using uint64_t available on most platforms.

Software Bits Newsletter

Fast has_hex_prefix check.

Or SIMD using uint16_t.