Why does different types of array subscript used to iterate affect auto vectorization

108 Views Asked by At

As following code shows, why uint32_t prevents the compiler (GCC 12.1 + O3) from optimizing by auto vectorization. See godbolt.

#include <cstdint>

// no auto vectorization
void test32(uint32_t *array, uint32_t &nread, uint32_t from, uint32_t to) {
    for (uint32_t i = from; i < to; i++) {
        array[nread++] = i;
    }
}

// auto vectorization
void test64(uint32_t *array, uint64_t &nread, uint32_t from, uint32_t to) {
    for (uint32_t i = from; i < to; i++) {
        array[nread++] = i;
    }
}

// no auto vectorization
void test_another_32(uint32_t *array, uint32_t &nread, uint32_t from, uint32_t to) {
    uint32_t index = nread;
    for (uint32_t i = from; i < to; i++) {
        array[index++] = i;
    }
    nread = index;
}

// auto vectorization
void test_another_64(uint32_t *array, uint32_t &nread, uint32_t from, uint32_t to) {
    uint64_t index = nread;
    for (uint32_t i = from; i < to; i++) {
        array[index++] = i;
    }
    nread = index;
}

After I ran the command g++ -O3 -fopt-info-vec-missed -c test.cc -o /dev/null, I got the following result. How to interpret it?

bash> g++ -O3 -fopt-info-vec-missed -c test.cc -o /dev/null
test.cc:5:31: missed: couldn't vectorize loop
test.cc:6:24: missed: not vectorized: not suitable for scatter store *_5 = i_18;
test.cc:21:31: missed: couldn't vectorize loop
test.cc:22:24: missed: not vectorized: not suitable for scatter store *_4 = i_22;
1

There are 1 best solutions below

10
Goswin von Brederlow On

Look at the function

void test32(uint32_t *array, uint32_t &nread, uint32_t from, uint32_t to)

and how it should behave if you call it like this:

uint32_t arr[16];
test32(arr, arr[3], &arr[0], &arr[15]);

This is called aliasing. The nread parameter might alias elements from array because they have the same type. But when you have

void test64(uint32_t *array, uint64_t &nread, uint32_t from, uint32_t to)

then no aliasing can occur because an uint32_t and uint64_t can never have the same address.

Note: passing a reference to a function internally passes the address so it's equivalent to a pointer for the argument of aliasing.

There are some types with special rules called aliasing types. The C++ standard says that you can cast an uint32_t* to char* and then access the raw memory underlying the uint32_t. That means an uint32_t* and char* can legally point at the same address. char* is an aliasing type because it aliases with any other type of (data) pointer. So is unsigned char* or any other variation of char including std::byte.

But you can tell the compiler that 2 pointers are not allowed to alias even if the type would permit it by using restrict.

void test32(uint32_t *array, uint32_t & restrict nread, uint32_t from, uint32_t to)

PS: test_another_32 looks like a missed compiler optiomization.