vc++ no longer vectorize simple for loops with range-based syntax

657 Views Asked by At

Before replacing a lot of my "old" for loops with range based for loops, I ran some test with visual studio 2013:

std::vector<int> numbers;

for (int i = 0; i < 50; ++i) numbers.push_back(i);

int sum = 0;

//vectorization
for (auto number = numbers.begin(); number != numbers.end(); ++number) sum += *number;

//vectorization
for (auto number = numbers.begin(); number != numbers.end(); ++number) {
    auto && ref = *number;
    sum += ref;
}

//definition of range based for loops from http://en.cppreference.com/w/cpp/language/range-for
//vectorization
for (auto __begin = numbers.begin(),
    __end = numbers.end();
    __begin != __end; ++__begin) {
    auto && ref = *__begin;
    sum += ref;
}

//no vectorization :(
for (auto number : numbers) sum += number;

//no vectorization :(
for (auto& number : numbers) sum += number;

//no vectorization :(
for (const auto& number : numbers) sum += number;

//no vectorization :(
for (auto&& number : numbers) sum += number;

printf("%f\n", sum);

looking at the disassembly, standard for loops were all vectorized:

00BFE9B0  vpaddd      xmm1,xmm1,xmmword ptr [eax]  
00BFE9B4  add         ecx,4  
00BFE9B7  add         eax,10h  
00BFE9BA  cmp         ecx,edx  
00BFE9BC  jne         main+140h (0BFE9B0h)  

but range based for loops were not :

00BFEAC6  add         esi,dword ptr [eax]  
00BFEAC8  lea         eax,[eax+4]  
00BFEACB  inc         ecx  
00BFEACC  cmp         ecx,edi  
00BFEACE  jne         main+256h (0BFEAC6h)  

Is there any reason why the compiler couldn't vectorize these loops ?

I really would like to use the new syntax, but loosing vectorization is too bad.

I just saw this question, so I tried the /Qvec-report:2 flag, giving another reason:

loop not vectorized due to reason '1200'

that is:

Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences.

Is this the same bug ? (I also tried with the last vc++ compiler "Nov 2013 CTP")

Should I report it on MS connect too ?

edit

Du to comments, I did the same test with a raw int array instead of a vector, so no iterator class is involved, just raw pointers.

Now all loops are vectorized except the two "simulated range-based" loops.

Compiler says this is due to reason '501':

Induction variable is not local; or upper bound is not loop-invariant.

I don't get what's going on...

const size_t size = 50;
int numbers[size];

for (size_t i = 0; i < size; ++i) numbers[i] = i;

int sum = 0;

//vectorization
for (auto number = &numbers[0]; number != &numbers[0] + size; ++number) sum += *number;

//vectorization
for (auto number = &numbers[0]; number != &numbers[0] + size; ++number) {
    auto && ref = *number;
    sum += ref;
}

//definition of range based for loops from http://en.cppreference.com/w/cpp/language/range-for
//NO vectorization ?!
for (auto __begin = &numbers[0],
    __end = &numbers[0] + size;
    __begin != __end; ++__begin) {
    auto && ref = *__begin;
    sum += ref;
}

//NO vectorization ?!
for (auto __begin = &numbers[0],
    __end = &numbers[0] + size;
    __begin != __end; ++__begin) {
    auto && ref = *__begin;
    sum += ref;
}

//vectorization ?!
for (auto number : numbers) sum += number;

//vectorization ?!
for (auto& number : numbers) sum += number;

//vectorization ?!
for (const auto& number : numbers) sum += number;

//vectorization ?!
for (auto&& number : numbers) sum += number;

printf("%f\n", sum);
1

There are 1 best solutions below

1
On

My guess could be that the range-based for loops do not offhand know that the object is a vector or an array or a linked list therefore the complier does not know beforehand vectorizes the loop. Range-based for loops are the equivalent of foreach loop in other languages. There might be a way to hint the complier to hint beforehand vectorizes the loop using a macro or a pragma or a complier setting. To check the please try using the code in other compliers and see what you get I would not be surprised if you get non-vectorized assembly code with the other compliers.