A common reason why code involving character sequences aren't vectorized is demonstrated by the following example. find_a() is a toy function that sets the current_index pointer to the next occurring 'a':
void find_a(size_t *current_index, const std::string &code) {
const size_t code_size = code.size();
size_t ci = *current_index;
while (ci < code_size && code[ci] != 'a') { ci++; }
*current_index = ci;
}
void find_a_vectorize(size_t *current_index, const std::string &code) {
const size_t code_size = code.size();
size_t ci = *current_index;
while (ci < (code_size-128)) {
char cpy[128];
std::memcpy(cpy, &code[ci], 128);
bool found = false;
for (int i = 0; i < 128; i++) {
if (cpy[i] == 'a') {found = true;}
}
if (found){
break;
}
ci+=128;
}
while (ci < code_size && code[ci] != 'a') { ci++; }
*current_index = ci;
}
The reason you have to write the second version for auto-vectorizing is, because the compiler doesn't know the length of the array (inside of the string). Vectorizing may read past the end of the array. Thus all loops with an early break or return can't be vectorized with a detrimental impact to performance. I don't think I need to explain why the second version is objectively worse.
I scavenged the web for a solution, and even asked ChatGPT (he recommended solving it with __restrict, smart). I'm looking for something like __builtin_guaranteed_minimal_array_length(&code[0], code_size). Since it is only a hint, a solution for just clang would be sufficient. Is there a way?