I have a thread that dumps images as raw data to disk. It works fine for a few minutes and then suddenly it just stops doing anything.
Through command line output I found that it stops at random positions within the loop.
The program doesn't crash within this thread (it crashes shortly after the thread stops running because my image buffer gets full), so no error/exception/anything from the thread.
Here's a sketch of my code:
class ImageWriter
{
public:
// constructor, destructor
void continueWriting();
private:
void writeImages();
std::thread m_WriterThread;
bool m_WriterThreadRunning;
std::mutex m_ThreadRunningMutex;
ImageManager * m_ImageManager;
};
ImageWriter::continueWriting()
{
// whenever a new image is acquired, this function is called
// so if the thread has finished, it needs to be restarted
// this function is also used for the first start of writing
m_ThreadRunningMutex.lock();
if ( m_WriterThreadRunning )
{
m_ThreadRunningMutex.unlock();
}
else
{
m_ThreadRunningMutex.unlock();
if( m_WriterThread.joinable() )
{
m_WriterThread.join();
}
m_WriterThreadRunning = true;
m_WriterThread = std::thread( &ImageWriter::writeImages, this );
}
}
void ImageWriter::writeImages()
{
while ( true )
{
// MyImage is a struct that contains the image pointer and some metadata
std::shared_ptr< MyImage > imgPtr = m_ImageManager->getNextImage(m_uiCamId);
if( imgPtr == nullptr )
{
// this tells the ImageWriter that currently there are no further images queued
break;
}
// check whether the image is valid. If it's not, skip this image and continue with the next one
[...]
// create filename
std::stringstream cFileNameStr;
cFileNameStr << [...];
std::ofstream cRawFile( cFileNameStr.str().c_str(), std::ios::out | std::ios::binary );
unsigned char * ucDataPtr = imgPtr->cImgPtr;
if( cRawFile.is_open() )
{
// calculate file size
unsigned int uiFileSize = [...];
cRawFile.write(reinterpret_cast<char*>(ucDataPtr), uiFileSize);
cRawFile.close();
}
// dump some metadata into a singleton class for logging
[...]
}
m_ThreadRunningMutex.lock();
m_WriterThreadRunning = false;
m_ThreadRunningMutex.unlock();
}
ImageManager is a class that takes care of image acquisition and queues the acquired images. It also triggers continueWriting(). The continueWriting() mechanism is necessary, as images may be written faster than they are acquired.
Why does this thread stop running at random times at random positions and without any error?
Valgrind doesn't yield anything within my control. I tried setting the thread's priority up, but that didn't make any difference. I also tried another disk, but that didn't make any difference either.
I noticed you're immediately unlocking the thread in both branches. Since all you're doing is reading a bool, you probably should avoid using locks entirely. Reading is not usually an operation that needs synchronization (unless it has side effects, such as reading a stream, or the location is deallocated, etc)
Consider: You will never read a True value from that bool before it's true and since all you do is read, you'll never run the risk of that function assigning an incorrect value to that bool. You don't assign a new value to the bool here until after you've already joined your thread.
I'd assume what's happening here is that your code locks the mutex, and another thread tries to write to it, but cannot since it's locked.