ManualResetEvent - WaitOne() does not seem to release thread at some point

2.8k Views Asked by At

I have a multi-threading form application and this is how the part in question is designed:

Thread 2 (BatchPreviewAssistant class) is waiting for the primary interface thread to pass images load task. Once the task is received, BatchPreviewAssistant assigns tasks to N=5 waiting PrimaryLoader threads and enables them. PrimaryLoaders are running as infinite loops started/stopped using 2 manual reset events: _startEvent and _endEvent. Also, there is an array of N manual reset events _parentSyncEvent to signal end of processing from PrimaryLoaders to BatchPreviewAssistant.

So normally each PrimaryLoader is waiting at _startEvent.WaitOne(). Once BatchPreviewAssistant needs to activate them and runs RunPrimaryLoaders(), it resets _endEvent and _parentSyncEvents first and then sets _startEvent. Now it blocks at WaitHandle.WaitAll(_parentSyncEvents The _startEvent.Set() causes all PrimaryLoader to proceed. Once each PrimaryLoader is done, it sets its own event in _parentSyncEvent until all 5 are set. At this point all PrimaryLoaders reach _endEvent.WaitOne() and wait. Now _parentSyncEvents are all set which enables BatchPreviewAssistant to continue. BatchPreviewAssistant resets _startEvent and then sets _endEvent which releases PrimaryLoaders and they come back to the beginning of the loop.

BatchPreviewAssistant:

    private void RunPrimaryLoaders()
    {
        BatchPreviewThreadsLogger.WriteLog(Common.LogLevel.Debug1, "RunPrimaryLoaders()");
        ResetEvents(_parentSyncEvents);
        _endEvent.Reset();
        _startEvent.Set();

        // Primary Loader loops restart

        BatchPreviewThreadsLogger.WriteLog(Common.LogLevel.Debug2, "WaitHandle.WaitAll(_parentSyncEvent");
        if (!WaitHandle.WaitAll(_parentSyncEvents, 20 * 1000))
        {
            throw new TimeoutException("WaitAll(_parentSyncEvent) in ProcessCurrentCommand");
            // TODO: Terminate?
        }
        BatchPreviewThreadsLogger.WriteLog(Common.LogLevel.Message3, "Primary loading is complete");
        _startEvent.Reset();
        _endEvent.Set();
        bool isEndEventSet = _endEvent.WaitOne(0);
        BatchPreviewThreadsLogger.WriteLog(Common.LogLevel.Debug2, "isEndEventSet?" + isEndEventSet.ToString());
    }

PrimaryLoader:

    public void StartProc(object arg)
    {
        while (true)
        {
            BatchPreviewThreadsLogger.WriteLog(Common.LogLevel.Debug2, "Primary Loader: _startEvent.WaitOne()");
            _startEvent.WaitOne();

            try
            {
                BatchPreviewThreadsLogger.WriteLog(Common.LogLevel.Message4, "Primary Loader is processing entry:" + processingEntry.DisplayPosition.ToString());
            }
            catch (Exception ex)
            {
                BatchPreviewThreadsLogger.WriteLog(Common.LogLevel.Error, "Exception in PrimaryImageLoader.StartProc:" + ex.ToString());
            }
            _parentSyncEvent.Set();
            BatchPreviewThreadsLogger.WriteLog(Common.LogLevel.Debug2, "Primary Loader: _endEvent.WaitOne()");
            _endEvent.WaitOne();
        }
    }

This code works pretty good making hundreds of such loops but I get an issue every once in a while, specifically during stress tests. What happens is that when BatchPreviewAssistant sets _endEvent.Set(), none of PrimaryLoaders are released at _endEvent.WaitOne(); You can see that I check in BatchPreviewAssistant and see that the event is really set, however PrimaryLoaders are not released.

[10/27/2011;21:24:42.796;INFO ] [42-781:16]Primary Loader: _endEvent.WaitOne()
[10/27/2011;21:24:42.796;INFO ] [42-781:18]Primary Loader: _endEvent.WaitOne()
[10/27/2011;21:24:42.796;INFO ] [42-781:19]Primary Loader: _endEvent.WaitOne()
[10/27/2011;21:24:42.843;INFO ] [42-843:15]Primary Loader: _endEvent.WaitOne()
[10/27/2011;21:24:42.937;INFO ] [42-937:17]Primary Loader: _endEvent.WaitOne()
[10/27/2011;21:24:42.937;INFO ] [42-937:14]Primary loading is complete
[10/27/2011;21:24:42.937;INFO ] [42-937:14]isEndEventSet?True

Is there any abvious problems with such design that may cause the issue? I can see some ways to try as work around, however it would be nice to see what is wrong with this aproach.

Just in case I am also providing informatin on the way I initialize and start PrimaryLoaders.

private PrimaryImageLoader[] _primaryImageLoaders;

_primaryImageLoaders = new PrimaryImageLoader[N]

for (int i = 0; i < _primaryImageLoaderThreads.Length; i++)
{
  _parentSyncEvents[i] = new AutoResetEvent(false);
  _primaryImageLoaders[i] = new PrimaryImageLoader(i, _parentSyncEvents[i], 
      _startEvent, _endEvent,
      _pictureBoxes, _asyncOperation,
      LargeImagesBufferCount);
  _primaryImageLoaderThreads[i] = new Thread(new ParameterizedThreadStart(_primaryImageLoaders[i].StartProc));
  _primaryImageLoaderThreads[i].Start();
}

Please note that some irrelevant code has been removed to simplify the sample

ADDED: I would agree that the sample is too busy and difficult to follow. So this is it in the nutshell:

Thread 2:
private void RunPrimaryLoaders()
{
  _endEvent.Reset();
  _startEvent.Set();

  _startEvent.Reset();
  _endEvent.Set();
  bool isEndEventSet = _endEvent.WaitOne(0);
}

Threads 3-7:
public void StartProc(object arg)
{
  while (true)
  {
    _startEvent.WaitOne();

    _endEvent.WaitOne();     // This is where it can't release occasionally although Thread 2 checks and logs that the event is set
  }
}
2

There are 2 best solutions below

0
On

Is there any abvious problems with such design that may cause the issue?

It seems like you are coming up with a very complicated design when you might be trying to do a simple thing. It seems that a simple Producer/Consumer pattern would work much better and you would not have to deal with this calamity of manual reset events.

You probably want something more along the lines of this:

class Producer
{
    private readonly BlockingQueue<Task> _queue;

    public Producer(BlockingQueue<Task> queue)
    {
        _queue = queue;
    }

    public LoadImages(List<Task> imageLoadTasks)
    {
        foreach(Task t in imageLoadTasks)
        {
            _queue.Enqueue(task);
        }
    }
}

class Consumer
{
    private volatile bool _running;
    private readonly BlockingQueue<Task> _queue;

    public Consumer(BlockingQueue<Task> queue)
    {
        _queue = queue;
        _running = false;
    }

    public Consume()
    {
        _running = true;

        while(_running)
        {
            try
            {
                // Blocks on dequeue until there is a task in queue
                Task t = _queue.Dequeue();

                // Execute the task after it has been dequeued
                t.Execute();
            }
            catch(ThreadInterruptedException)
            {
                // The exception will take you out of a blocking
                // state so you can check the running flag and decide
                // if you need to exit the loop or if you shouldn't.
            }
        }
    }
}

So you would have to run each Producer instance on a separate thread and each Consumer instance on its own thread too. Of course, you have to add in all the bells and whistles to terminate them gracefully, but that's another story.

0
On

You have a race condition. If your logic is that you detect a condition, set an event to block, and then wait on the event, there must be an intervening unlock.

Your code does this:

  1. Decide to wait

  2. Set event to block

  3. Wait on event

The problem occurs if the event occurs between steps 1 and 2. The event may have already occurred and unblocked the event when we set the event to block. When we get to step 3, we are waiting for an event that has already occurred to unblock an object it has already unblocked. Bad.

The fix is as follows:

  1. Acquire lock

  2. Do we need to wait? If no, release lock and return

  3. Set event to block

  4. Release lock

  5. Wait on event

Because we hold a lock now, the event cannot occur between when we decide to wait and when we set the event to block. The code that unblocks the event must, of course, hold the same lock as it goes through the logic of processing the event and unblocking the event.