Why does clang optimize out a loop polling a variable that another thread writes to?

242 Views Asked by At

While I was studying C++, I found something weird...
I though that the below code would produce the result of big number(At least not 1.1).
Instead the result was enter image description here

Other compilers worked as expected.
But the clang compiler with aggressive optimization seem to ignore the while loop.
So my question is, what's the problem with my code? Or is this intended by the clang?

I used the apple clang compiler(v14.0.3)

#include <iostream>
#include <thread>


static bool should_terminate = false;

void infinite_loop() {
    long double i = 1.1;
    while(!should_terminate)
        i *= i;
    std::cout << i;
}

int main() {
    std::thread(infinite_loop).detach();
    std::cout << "main thread";
    for (int i = 0 ; i < 5; i++) {
        std::this_thread::sleep_for(std::chrono::seconds(1));
        std::cout << ".";
    }
    should_terminate = true;
}

Assembly result from compiler explorer(clang v16.0.0, -O3)
This also seemed to skip the while loop.

_Z13infinite_loopv:                     # @_Z13infinite_loopv
        sub     rsp, 24
        fld     qword ptr [rip + .LCPI0_0]
        fstp    tbyte ptr [rsp]
        mov     rdi, qword ptr [rip + _ZSt4cout@GOTPCREL]
        call    _ZNSo9_M_insertIeEERSoT_@PLT
        add     rsp, 24
        ret
2

There are 2 best solutions below

1
user17732522 On BEST ANSWER

Your code has undefined behaviour:

should_terminate is not an atomic object, so writing to it in one thread and accessing it in another thread potentially concurrently (i.e. without any synchronization) is a data race, which is always undefined behaviour.

Practically speaking this UB rule permits the compiler to make exactly the optimization you see here.

The compiler can assume that should_terminate will never change in the loop, because it cannot possibly be written to from another thread since that would be a data race. So when reaching the loop it is either false and stays false, so that the loop never terminates, or it is true, in which case the loop body doesn't execute at all.

Then, because an infinite loop that doesn't perform any atomic/IO/volatile/synchronization operation would also have UB, the compiler can further deduce that should_terminate must be (always) true when the loop is reached. Consequently the loop body can never be executed and removing the loop is a permitted optimization.

So Clang is behaving correctly here and your expectations are wrong. should_terminate must be a std::atomic<bool> (or std::atomic_flag) so that writing to it unsynchronized with other access it is not a data race.

4
Kaushal Singh On

Without the synchronization mechanism or atomic type of should_terminate variable will not provide what you are expecting. With inclusion of mutex (synchronization) below code is generating the infinite loop.

#include <iostream>
#include <thread>
#include<mutex>
using namespace std;
mutex mu;
static bool should_terminate = false;

void infinite_loop() {
    long double i = 1.1;
    lock_guard<mutex> lock(mu);
    while (!should_terminate)
    {
        cout << "From Child thread" << endl;
        i *= i;
    }
      
    std::cout << i;
}

int main() {
    std::thread(infinite_loop).detach();
    std::cout << "main thread";
    for (int i = 0; i < 5; i++) {
        std::this_thread::sleep_for(std::chrono::seconds(1));
        std::cout << ".";
    }
    lock_guard<mutex> lock(mu);
    should_terminate = true;
}