it's funny. Initially I was skimming fast and thought you'll be using an array for the loop. I did not realize that it was a vector. Interesting finds though. There is a limit where the compile/optimizer(?) would say - I'm not doing this nice optimization anymore and falling back to regular allocation.
So in the last snippet, gcc is using jne. Is this worse? in terms of the cpu branch predictor?
I'm afraid I'm going to have to be very non-creative and answer "it depends" :) When it comes to branch prediction, I usually expect modern CPUs to do well, but as I've been reminded numerous times - trust but verify, so only proper performance testing on the specific hardware and specific binary layout can tell if it's fast in practice.
it's funny. Initially I was skimming fast and thought you'll be using an array for the loop. I did not realize that it was a vector. Interesting finds though. There is a limit where the compile/optimizer(?) would say - I'm not doing this nice optimization anymore and falling back to regular allocation.
So in the last snippet, gcc is using jne. Is this worse? in terms of the cpu branch predictor?
I'm afraid I'm going to have to be very non-creative and answer "it depends" :) When it comes to branch prediction, I usually expect modern CPUs to do well, but as I've been reminded numerous times - trust but verify, so only proper performance testing on the specific hardware and specific binary layout can tell if it's fast in practice.