How do I determine an appropriate value for MaxDegreeOfParallelism when using Parallel.ForEachAsync

235 Views Asked by Darragh At 30 March 2022 at 22:45

The example Scott Hanselman gives on his blog for using Parallel.ForEachAsync in .NET 6 specifies the value of MaxDegreeOfParallelism as 3.

However, if unspecified, the default MaxDegreeOfParallelism is ProcessorCount. This makes sense for CPU bound work, but for asynchronous I/O bound work, it seems like a poor choice for a default value.

If I'm doing something like in Scott's example below, but I want to do it as fast as possible, how should I determine the best value to use for MaxDegreeOfParallelism? Is it reasonable to specify this as int.MaxValue and just assume the TaskScheduler will do the most sensible thing when it comes to scheduling the work on the ThreadPool?

ParallelOptions parallelOptions = new()
{
    MaxDegreeOfParallelism = 3
};
 
await Parallel.ForEachAsync(userHandlers, parallelOptions, async (uri, token) =>
{
    var user = await client.GetFromJsonAsync<GitHubUser>(uri, token);
 
    Console.WriteLine($"Name: {user.Name}\nBio: {user.Bio}\n");
});

Original Q&A

There are 1 best solutions below

tmaj On 31 March 2022 at 00:18

IMHO The only way to get the number is...testing.

For http work there are two parties involved:

you code
the remote side that does the work for you.

Your fast may be too fast for the remote side. This can because of resources and/or throttling.

Note on the default

The default - which results in ProcessorCount - will depend on the machine that the code runs on and if you run your code in the cloud this number may be different than what's on your beefy laptop.

This can lead to unexpected differences between non-prod and prod environments.

GitHub specific

gitHub.com has a 5,000 requests per hour for non-enterprise users (from here) and there is also this:

In order to provide quality service on GitHub, additional rate limits may apply to some actions when using the API. For example, using the API to rapidly create content, poll aggressively instead of using webhooks, make multiple concurrent requests, or repeatedly request data that is computationally expensive may result in secondary rate limiting.

In Best practices for integrators we can read

Dealing with secondary rate limits

Secondary rate limits are another way we ensure the API's availability. To avoid hitting this limit, you should ensure your application follows the guidelines below.

...

Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.

How do I determine an appropriate value for MaxDegreeOfParallelism when using Parallel.ForEachAsync

There are 1 best solutions below

Note on the default

GitHub specific

Related Questions in C#

Related Questions in MULTITHREADING

Related Questions in ASYNC-AWAIT

Related Questions in .NET-6.0

Related Questions in PARALLEL.FOREACHASYNC

Trending Questions

Popular # Hahtags

Popular Questions