2 Comments
User's avatar
Yacob Cohen-Arazi's avatar

it's funny. Initially I was skimming fast and thought you'll be using an array for the loop. I did not realize that it was a vector. Interesting finds though. There is a limit where the compile/optimizer(?) would say - I'm not doing this nice optimization anymore and falling back to regular allocation.

So in the last snippet, gcc is using jne. Is this worse? in terms of the cpu branch predictor?

Expand full comment
Taras Tsugrii's avatar

I'm afraid I'm going to have to be very non-creative and answer "it depends" :) When it comes to branch prediction, I usually expect modern CPUs to do well, but as I've been reminded numerous times - trust but verify, so only proper performance testing on the specific hardware and specific binary layout can tell if it's fast in practice.

Expand full comment