How do I determine an appropriate value for MaxDegreeOfParallelism when using Parallel.ForEachAsync

235 Views Asked by At

The example Scott Hanselman gives on his blog for using Parallel.ForEachAsync in .NET 6 specifies the value of MaxDegreeOfParallelism as 3.

However, if unspecified, the default MaxDegreeOfParallelism is ProcessorCount. This makes sense for CPU bound work, but for asynchronous I/O bound work, it seems like a poor choice for a default value.

If I'm doing something like in Scott's example below, but I want to do it as fast as possible, how should I determine the best value to use for MaxDegreeOfParallelism? Is it reasonable to specify this as int.MaxValue and just assume the TaskScheduler will do the most sensible thing when it comes to scheduling the work on the ThreadPool?

ParallelOptions parallelOptions = new()
{
    MaxDegreeOfParallelism = 3
};
 
await Parallel.ForEachAsync(userHandlers, parallelOptions, async (uri, token) =>
{
    var user = await client.GetFromJsonAsync<GitHubUser>(uri, token);
 
    Console.WriteLine($"Name: {user.Name}\nBio: {user.Bio}\n");
});
1

There are 1 best solutions below

1
tmaj On

IMHO The only way to get the number is...testing.

For http work there are two parties involved:

  1. you code
  2. the remote side that does the work for you.

Your fast may be too fast for the remote side. This can because of resources and/or throttling.

Note on the default

The default - which results in ProcessorCount - will depend on the machine that the code runs on and if you run your code in the cloud this number may be different than what's on your beefy laptop.

This can lead to unexpected differences between non-prod and prod environments.

GitHub specific

gitHub.com has a 5,000 requests per hour for non-enterprise users (from here) and there is also this:

In order to provide quality service on GitHub, additional rate limits may apply to some actions when using the API. For example, using the API to rapidly create content, poll aggressively instead of using webhooks, make multiple concurrent requests, or repeatedly request data that is computationally expensive may result in secondary rate limiting.

In Best practices for integrators we can read

Dealing with secondary rate limits

Secondary rate limits are another way we ensure the API's availability. To avoid hitting this limit, you should ensure your application follows the guidelines below.

  • ...
  • Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.