Threadpool in C# too slow, is there a way to speed up it? Thread.Sleep(0) and QueueUserWorkItem issues

6k Views Asked by At

I am using Threadpool in a C# application that need to do some CPU-intensive work. By the way it seems too slow (EDIT: it prints out debug string "Calculating on " + lSubArea.X + ":" + lSubArea.Y + " " + lSubArea.Width + ":" + lSubArea.Height only few times every 10 seconds, while I'm expecting to see that at least NUM_ROWS_GRID^2 = 16 times every few seconds), also changing MinThreads via SetMinThreads method. I don't know if switch to custom threads or if there's a way to speed up it. Searching on Google returns me some result but nothing works; same situation with MSDN.

Old Code follows:

private void StreamerRoutine()
{
   if (this._state.Area.Width == 0 && this._state.Area.Height == 0)
      this._state.Area = new Rectangle(0, 0, Screen.PrimaryScreen.Bounds.Width, Screen.PrimaryScreen.Bounds.Height);

   while (this._state.WorkEnd == false)
   {
      // Ends time slice if video is off
      if (this._state.VideoOn == false)
         Thread.Sleep(0);
      else
      {
         lock(this._state.AreaSync)
         {
             Int32 lWidth = this._state.Area.Width / Constants.NUM_ROWS_GRID;
             Int32 lHeight = this._state.Area.Height / Constants.NUM_ROWS_GRID;
             for (Int32 lX = 0; lX + lWidth <= this._state.Area.Width; lX += lWidth)
                for (Int32 lY = 0; lY + lHeight <= this._state.Area.Height; lY += lHeight)
                   ThreadPool.QueueUserWorkItem(CreateDiffFrame, (Object)new Rectangle(lX, lY, lWidth, lHeight));
         }
      }
    }
}

private void CreateDiffFrame(Object pState)
{
   Rectangle lSubArea = (Rectangle)pState;

   SmartDebug.DWL("Calculating on " 
          + lSubArea.X + ":" + lSubArea.Y + " " 
          + lSubArea.Width + ":" + lSubArea.Height);
   // TODO : calculate frame
   Thread.Sleep(0);
}

EDIT: CreateDiffFrame function is only a stub I used to know how many times it is called per second. It will be replaced with CPU intensive work as I define the best way to use thread in this case.

EDIT: I removed all Thread.Sleep(0); I thought it could be a way to speed up routine but it seems it could be a bottleneck.. new code follows:

EDIT: I made WorkEnd and VideoOn volatile in order to avoid cached values and so endless loop; I added also a semaphore to make every bunch of work items start after previous bunch is done.. now it is working quite well

private void StreamerRoutine()
    {
        if (this._state.Area.Width == 0 && this._state.Area.Height == 0)
            this._state.Area = new Rectangle(0, 0, Screen.PrimaryScreen.Bounds.Width, Screen.PrimaryScreen.Bounds.Height);

        this._state.StreamingSem = new Semaphore(Constants.NUM_ROWS_GRID * Constants.NUM_ROWS_GRID, Constants.NUM_ROWS_GRID * Constants.NUM_ROWS_GRID);


        while (this._state.WorkEnd == false)
        {
            if (this._state.VideoOn == true)
            {
                for (int i = 0; i < Constants.NUM_ROWS_GRID * Constants.NUM_ROWS_GRID; i++)
                    this._state.StreamingSem.WaitOne();

                lock(this._state.AreaSync)
                {
                    Int32 lWidth = this._state.Area.Width / Constants.NUM_ROWS_GRID;
                    Int32 lHeight = this._state.Area.Height / Constants.NUM_ROWS_GRID;
                    for (Int32 lX = 0; lX + lWidth <= this._state.Area.Width; lX += lWidth)
                        for (Int32 lY = 0; lY + lHeight <= this._state.Area.Height; lY += lHeight)
                            ThreadPool.QueueUserWorkItem(CreateDiffFrame, (Object)new Rectangle(lX, lY, lWidth, lHeight));

                }
            }
        }
    }

private void CreateDiffFrame(Object pState)
    {
        Rectangle lSubArea = (Rectangle)pState;

        SmartDebug.DWL("Calculating on " + lSubArea.X + ":" + lSubArea.Y + " " + lSubArea.Width + ":" + lSubArea.Height);
        // TODO : calculate frame
        this._state.StreamingSem.Release(1);

    }
3

There are 3 best solutions below

1
On BEST ANSWER

There really isn't a good way to tell you exactly what's making your code slow from what I see, but there are a couple of things that stand out:

  1. Thread.Sleep(0). When you do this, you give up the rest of your timeslice from the OS, and slow down everything, because CreateDiffFrame() can't actually return until the OS scheduler comes back to it.

  2. The object cast of Rectangle, which is a struct. You incur the overhead of boxing when this happens, which isn't going to be something you'll want for truly compute-intensive operations.

  3. Your calls to lock(this._state.AreaSync). It could be that AreaSync is being locked somewhere else, too, and that could be slowing things down.

  4. You may be queueing items too granularly -- if you queue very small items of work, it's likely that the overhead of putting these items in the queue one at a time could be significant as compared to the actual amount of work done. You could also perhaps consider putting the contents of the inner loop inside the queued work item to cut down this overhead.

If this is something you're trying to do for parallel computation, have you investigated using PLINQ or another such framework?

1
On

My guess would be that it's the Sleep at the end of CreateDiffFrame. It means each thread stays alive for at least another 10 ms, if I remember correctly. You can probably do the actual work in less than 10 ms. ThreadPool tries to optimize the usage of threads, but I think it has an upper limit to the total number of outstanding threads. So if you want to actually mimic your workload, make a tight loop that waits until the expected number of milliseconds have passed instead of a Sleep.

Anyway, I don't think using ThreadPool is the actual bottleneck, using an other threading mechanism will not speed up your code.

0
On

There is known bug with the ThreadPool.SetMinThreads method described in KB976898:

After you use the ThreadPool.SetMinThreads method in the Microsoft .NET Framework 3.5, threads maintained by the thread pool do not work as expected

You can download a fix to this behavior from here.