Are indexes easier to vectorize than pointers?

166 Views Asked by At

Is there any example (e.g. on https://godbolt.org/ ) where CLang generates worse code when an algorithm expressed by pointer iterations instead of array indexes? E.g. it can vectorize/unfold in one case but can't in the other one?

In simple examples apparently it doesn't matter. Here is a pointer iteration style:

while (len-- > 0) {
  *dst++ = *src++;
}

Here is the logically same code in index style:

while (idx != len) {
  dst[idx] = src[idx];
  idx++;
}

Disregard any UB and/or off by one errors here.

Edit: the argument about indices being sugar is irrelevant, as desugraing doesn't change the algorithm style. So the following pointer based code is still in the index style:

while (idx != len) {
  *(dst + idx) = *(src + idx);
  idx++;
}

Note that the index-based loop has only 1 changing variable, while the pointer-based loop has 2, and the compiler must infer that they always change together.

You should look at this in the context of https://en.wikipedia.org/wiki/Induction_variable and https://en.wikipedia.org/wiki/Strength_reduction. Pointer style is essentially strength-reduced index-style, as addition is replaced by increments. And this reduction was beneficial for performance for some time, but no longer.

So my question boils down to if there are situations when this strength reduction cannot be performed or reversed by a compiler.

Another possible case is when indexes are not induction variables. So corresponding pointer code includes "arbitrary jumps" and it's somehow harder to transform the loop due to "history" of past iterations.

1

There are 1 best solutions below

0
On

As long as no overloaded operator [] is involved, a subscript expression is literally defined to be identical to pointer arithmetic followed by dereferencing the result [expr.sub]/1. Thus, as long as both versions are indeed equivalent, compilers should generally be able to optimize both versions equally well (I'd probably go as far as considering a compiler's failure to optimize one but not the other a performance bug). That being said, note that there are lots of subtleties such as the wrap-around behavior of unsigned arithmetic that can make iterating over an index not exactly equivalent to iterating over a pointer…