TL;DR: Why does this https://godbolt.org/z/ohK31hW34 multithreaded program segfault?
Explanation: I am encountering weird behavior of my multithreaded C++ application. The application has multiple threads that loop in a while loop guarded by an std::atomic<bool>
variable. I use this construct in multiple places, so I extracted it to a simple ThreadLoop
class with Start(function)
& Stop()
methods.
class ThreadLoop
{
public:
ThreadLoop(const std::string& name) : mName(name) {}
~ThreadLoop() { Stop(); }
template <typename F>
void Start(F&& function)
{
if (mRunning)
return;
std::scoped_lock lock(mMutex);
if (mThread.joinable())
mThread.join();
mRunning = true;
mThread = std::thread([&]() {
while (mRunning)
{
function();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
});
}
void Stop()
{
if (not mRunning)
return;
mRunning = false;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
private:
std::atomic<bool> mRunning = false;
std::mutex mMutex;
std::thread mThread;
std::string mName;
};
I then use an object of this custom class as a member in a different 'worker' class that assigns a particular function to be periodically executed, like so
class Worker1
{
public:
void StartWorking()
{
mThread.Start([this]() { Work(); });
}
void StopWorking() { mThread.Stop(); }
private:
ThreadLoop mThread{"worker1 loop"};
void Work()
{
fmt::print("Working...\n");
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
};
I have all of these 'workers' in another class and call StartWorking()
/ StopWorking()
on them at random time points (also in a ThreadLoop
)
class Main
{
public:
void Start()
{
mThread.Start([this]() { MainLoop(); });
}
void Stop() { mThread.Stop(); }
private:
ThreadLoop mThread{"main loop"};
Worker1 mWorker1;
void MainLoop()
{
if (/*something*/)
mWorker1.StartWorking();
else
mWorker1.StopWorking();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
};
The first ThreadLoop
("main loop") in the Main
class starts fine and starts calling StartWorking()
/ StopWorking()
, as expected. The StartWorking()
then triggers the worker to start its own ThreadLoop
("worker1 loop"), which fails undeterministically inside the ThreadLoop::Start()
function, for example via
`../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion mutex->__data.__owner == 0 failed`
Also, according to the debugger, the entire ThreadLoop
object seems to be uninitialized / destroyed (e.g. the std::string mName
variable is empty, although I always provide a non-empty string) - this probably causes the std::scoped_lock
to fail - locking an uninitialized/destroyed mutex. My question is, how/why is the ThreadLoop
object uninitialized? I think I clearly construct it as a member inside each Worker1
object?
The issue was that I was capturing the
T&& function
by reference in the lambda insideThreadLoop::Start()
, and the thread then invoked the lambda, which was a dangling reference. The fix was to captureT&& function
by value.Original version:
Fixed version:
Even better version: