Should I use Parallel.ForEach on a server for making many simultaneous web requests

3.6k Views Asked by At

I've read a lot about Parallel.ForEach, but haven't really found a solid answer to my question.

We have a Windows Service that pulls rows from multiple databases every couple minutes and using a foreach loop, sends those rows off through web requests to complete the operations. So all of these web requests are currently done sequentially and take too long, so we want to run them in parallel.

My initial investigation led me to believe that a Producer-Consumer approach using threads would be best, where every couple minutes a producer puts the rows into a thread-safe queue, and during initialization of the service I simply start up a number of consumer threads (say 10 for example, but potentially 100 or more), which constantly check the queue to see if there are rows that need to be sent off via a web request or not.

A co-worker suggested simply changing our foreach loop to a Parallel.ForEach instead. My first concern with this was that the ForEach would block all operations until all items in the enumeration completed, so if it had 10 items and 9 finished in 5 seconds and one finished in 5 minutes, it would be essentially doing nothing but that one request for 4 minutes and 55 seconds. That can be overcome by simply doing the Parallel.ForEach within a new thread, like so:

Task.Factory.StartNew( () => Parallel.ForEach<Item>(items, item => DoSomething(item)));

So with this what would happen is every couple of minutes a new Parallel.ForEach loop would be started with all of the new rows that had been added to the databases since the last check, even if previous Parallel.ForEach loops had not completed (i.e. that 5 minute long request would not block new requests from being made).

This is simple enough to do and minimizes the code changes that need to be made drastically, but I'm still concerned about running this on our server that is hosting other services and websites. I've read that Parallel.ForEach can potentially pin all CPUs on a server, even though a simple web request is not a CPU intensive operation. I know that I can limit the number of threads the loop will use by using the MaxDegreeOfParallelism property, so I would set it to 10 or 100 or whatever. This is nice as instead of having 10 or 100 tasks constantly running and doing nothing, Parallel.ForEach would just spin up however many it needs and then close them when the loop is done. But I'm still hesitant that it may consume too many resources on the server.

So which of these options (or others) is best for my scenario? Are my concerns about using Parallel.ForEach on a server machine justified? It definitely looks like the "simpler" and "lazier" solution, so I just want to make sure it doesn't come back to bite me if we go with it. Also, I'm not concerned with scaling this solution out to multiple servers; just running on a single server that also runs other services and websites.

Update

The comments asked for some source code to provide more context.

Here is a simplified version of what we are currently doing:

void FunctionGetsCalledEvery2Minutes()
{
    // Synchronously loop over each database that we need to check.
    foreach (var database in databasesToCheck)
    {
        // Get the rows from this database.
        var rows = database.GetRowsFromTable();

        // Synchronously send each row to a web service to be processed.
        foreach (var request in rows)
        {
            SendRequestToWebServiceToBeProcessed(request);
        }
    }
}

SendRequestToWebServiceToBeProcessed(DatabaseRow request)
{
    // Request may take anywhere from 1 second to 10 minutes.
    Thread.Sleep(_randomNumberGenerator.Next(1000, 600000));
}

Here is a simplified version of how the code would look using Parallel.ForEach:

void FunctionGetsCalledEvery2Minutes()
{
    // Synchronously loop over each database that we need to check.
    foreach (var database in databasesToCheck)
    {
        // Get the rows from this database.
        var rows = database.GetRowsFromTable();

        // Asynchronously send each row to a web service to be processed, processing no more than 30 at a time.
        // Call the Parallel.ForEach from a new Task so that it does not block until all rows have been sent.
        Task.Factory.StartNew(() => Parallel.ForEach<DatabaseRow>(rows, new ParallelOptions() { MaxDegreeOfParallelism = 30 }, SendRequestToWebServiceToBeProcessed));
    }
}

And here is a simplified version of how the code would look using producer-consumer:

private System.Collections.Concurrent.BlockingCollection<DatabaseRow> _threadSafeQueue = new System.Collections.Concurrent.BlockingCollection<DatabaseRow>();
void FunctionGetsCalledEvery2Minutes()
{
    // Synchronously loop over each database that we need to check.
    foreach (var database in databasesToCheck)
    {
        // Get the rows from this database.
        var rows = database.GetRowsFromTable();

        // Add the rows to the queue to be processed by the consumer threads.
        foreach (var row in rows)
        {
            _threadSafeQueue.Add(row);
        }
    }
}

void ConsumerCode()
{
    // Take a request off the queue and send it away to be processed.
    var request = _threadSafeQueue.Take();
    SendRequestToWebServiceToBeProcessed(request);
}

void CreateConsumerThreadsOnApplicationStartup(int numberOfConsumersToCreate)
{
    // Create the number of consumer threads specified.
    for (int i = 0; i < numberOfConsumersTo; i++)
    {
        Task.Factory.StartNew(ConsumerCode);
    }
}

I have one synchronous producer in this example, but I could easily spin up an asynchronous producer thread for each database to poll.

One thing to note here is that in the Parallel.ForEach sample I limit it to only process up to 30 threads at a time, but that only applies to that one instance. If 2 minutes elapses and that Parallel.ForEach loop still has 10 requests that haven't finished, it will spin up 30 new threads, totaling 40 threads running simultaneously. So if the web requests have a timeout of say 10 minutes, we could easily run into a situation where we have 150 threads running simultaneously (10 mins / 2 mins = function called 5 times * 30 threads per instance = 150). This is a potential concern, as if I bump up the number of max threads allow, or start calling the function at a smaller time interval than 2 minutes, I could soon be running thousands of threads simultaneously, consuming more resources on the server than I want. Is this a valid concern? The consumer-producer approach does not have this problem; it would only ever run as many threads as I specified for the numberOfConsumersToCreate variable.

It has been mentioned that I should use TPL Dataflows for this, but I've never used them before and don't want to spend a whole ton of time on this project. If TPL Dataflows are still the best option I would like to know, but I would also like to know which of these 2 approaches (Parallel.ForEach vs. Producer-Consumer) is better for my scenario.

Hopefully this gives more context so I can get better targeted answers. Thanks :)

2

There are 2 best solutions below

2
On

If you have many short operations and an occasional long operation, the Parallel.ForEach will block until all operations are finished. However, while it's working on that one long request it won't peg all your cores, just the one that is still working. Keep in mind that it will attempt to use all the cores while there are many items being worked on.

EDIT:

With the MaxDegreeOfParallelism property there is no reason to set it above the number of threads your CPU can run (limited by the number of cores and degree of hyper threading). In fact it's only useful to reduce it to a number below that.

Since blocking isn't a concern Parallel.ForEach, while seeming lazy, is very appropriate if your items really can be run concurrently.

3
On

I am not deeply into your code, but i have some experience / advice.

Parallel code can indeed be fast, it depends on the number of cores launching hundreds of threads on a quad-core isn't ideal, if there 4 threads would (normally) be better, i know there are some cases. but in general you don't need to think about it since the latest .net version handle it.

There is however another big concern with parallel code You dont have control in what order things are executed. So if you do a console.print(i) where i is like a paralel for next you do from 0 to 100, then on screen you will not see 1,2,3,4,5,6,7 but something chaotic, as each threads prints his part of the number range you'll see something like 1,14,37,70,2,15,80,.. each number is written once but their order isnt logical.

Keep that last thing in mind, if you have some complex database math ea you need to combine several lookups do a complex calculation, and then do create a new table. Then you might find speed improvement if that "Complex calculation" can be executed as paralleled. if the complex calculation can create a new key-value pair with unique keys for your database your fine.

But there can be problems in parralel math. Supose you need to update a value, but another 'thread' also needs to update it, then what will the end result be ?, I know there are wait and lock mechanisms, but if your in such a scenario i find it usually the best to rethink redesign of the code, and rethink your problem.

Maybe create extra array's lists dictionaries or tables, to temporally store results and later combine them, this is the usual way how i tackle such problems.

Try to write such math / logic short, and simple like described here and you can achieve usually great speed boosts. I'm aware more can be done, but keeping the logic simple will work fast if, possible try to stick to simple logic as it also keeps your code clean. places where one uses parralel code can allready by complex enough.

One more note, if your complex calculation can variate in execution time (a few if's need to be checked and they might add extra complexity). Then it might usually be better to launch lots of mini 'math/code' parts, as compared to a few huge complex code parts. A huge thread queue with mini tasks is faster completed as a small queue with huge tasks.