I'm trying to get some of my code to vectorize, but I keep running into info C5002: loop not vectorized due to reason '1305'
. According to this page:
// Code 1305 is emitted when the compiler can't discern proper vectorizable type information for this loop.
(I'm using Visual Studio Community 2022)
I decided to experiment with some non-functional code to better understand why this was happening, but this error seems to pop up in code that should be obviously typed, and easy to vectorize. This is my code:
int vecTest() {
int v0[128] alignas(16);
int v1[128] alignas(16);
int v2[128] alignas(16);
int sum = 0;
for (int i = 0; i < 128; i++) {
v0[i] = i-1;
v1[i] = i*2;
}
for (int i = 0; i < 128; i++) {
v2[i] = v0[i] + v2[i];
}
#ifdef CASE_TWO
int* pv0 = &v0[0];
int* pv1 = &v1[0];
int* pv2 = &v2[0];
for (int i = 0; i < 128; i++) {
pv2[i] = pv0[i] + pv2[i];
}
#endif
sum += v2[0];
return sum;
}
int main(int argc, char* argv[])
{
int sum = vecTest();
sum = sum + 1;
}
If CASE_TWO is absent, the first (initialization) loop will vectorize, but the second will return code 1305. However, adding the contents of CASE_TWO causes all three loops to vectorize properly! Additionally, including the CASE_TWO code and excluding the second loop causes CASE_TWO to return 1305.
It seems to me that none of these loops should have trouble being vectorized, and that they shouldn't affect each other. What am I missing?
What is the actual meaning of code 1305 and "proper vectorizable type information", and does the compiler actually behave in the manner suggested by the documentation?
I'm using default compiler settings, except for /O2
and /Qvec-report:2
.
If you look at the asm (on Godbolt), we can see MSVC folded the two loops together so there is no separate init loop. It just computes
v0[i]
on the fly, adding into the uninitializedv2[i]
(vector load and store from the space it allocated but never wrote).It reports the first loop getting vectorized and the second not, but really it's fusing them into one asm loop. The work in those loops all gets vectorized so this is arguably a bug in its reporting. (Except for optimizing away the unused
v1[i] = i*2;
that nothing ever reads.)By comparison, GCC isn't that clever and does allocate space for both v2 and v1 (
sub rsp, 928
, plus the 128-byte red-zone, is just over 1024 = 2x128 * sizeof(int)
). MSVC allocated space forv2
, notv0
(sub rsp, 536
is just over 128 * sizeof(int) = 512). Neither compiler allocated space for the unusedv1
, IDK why that's cluttering up your example.Clang optimizes away everything (including the return value because reading uninitialized
v2[]
is UB in C++, or at least indeterminate so it can leave whatever garbage it wants in EAX as the return value). Withalignas(16) int v2[128] = {};
, clang still optimizes away the arrays, just returning-1
. https://godbolt.org/z/E9v1evE94 - clang requires standardalignas(128) int v0[];
syntax, not allowing thealignas
to go after the declaration. GCC and MSVC allow that.With init of
v2
, MSVC does callmemset
for that, but then still makes the same single loop that materializesv0[i]
on the fly to add intov2[i]
.