C# Parallel.ForEach and Task.WhenAll sometimes returning less values then supposed

2k Views Asked by At

I have this:

Parallel.ForEach(numbers, (number) =>
{
    var value = Regex.Replace(number, @"\s+", "%20");

    tasks.Add(client.GetAsync(url + value));
});

await Task.WhenAll(tasks).ConfigureAwait(false);

foreach (var task in tasks)
{
  ...
}

Sometimes returns less tasks when reaching the foreach(var task in tasks), but after a few requests, starts returning all the tasks.

Ive changed the ConfigureAwait to true and still sometimes returns less tasks.

BTW Im using Parallel.ForEach beacuse each client.GetAsync(url + value) its a request to an external api with the particularity that its latency SLA is lower than 1s for 99% of its requests

Can you guys explain me why it returns less tasks sometimes?

And is there a way to guarantee returning always all tasks?

Thanks

2

There are 2 best solutions below

7
On BEST ANSWER

And is there a way to guarantee returning always all tasks?

Several people in the comments are pointing out you should just do this, on the assumption that numbers is a non-threadsafe List:

    foreach(var number in numbers)
    {
        var value = Regex.Replace(number, @"\s+", "%20");

        tasks.Add(client.GetAsync(url + value));
    }

    await Task.WhenAll(tasks).ConfigureAwait(false);

    foreach (var task in tasks)
    {
      ...
    }

There doesn't seem to be any considerable benefit in parallelizing the creation of the tasks that do the download; this happens very quickly. The waiting for the downloads to complete is done in the WhenAll

ps; there are a variety of more involved ways to escaping data for a URL, but if you're specifically looking to convert any kind of whitespace to %20, I guess it makes sense to do it with regex..

Edit; you asked when to use a Parallel ForEach, and I'm going to say "don't, generally, because you have to be more careful about th contexts within which you use it", but if you made the Parallel.ForEach do more syncronous work, it might make sense:

    Parallel.ForEach(numbers, number =>
    {
        var value = Regex.Replace(number, @"\s+", "%20");

        var r = client.Get(url + value));

        //do something meaningful with r here, i.e. whatever ... is in your  foreach (var task in tasks)

    });

but be mindful if you're performing updates to some shared thing, for coordination purposes, from within the body then it'll need to be threadsafe

6
On

You haven't shown it, so we can only guess but I assume that tasks is a List<>. This collection type is not thread-safe; your parallel loop is likely "overwriting" values. Either perform manual locking of your list or switch to a thread-safe collection such as a ConcurrentQueue<>

var tasks = new ConcurrentQueue<Task<string>>();

Parallel.ForEach(numbers, number =>
{
    var value = Regex.Replace(number, @"\s+", "%20");
    tasks.Enqueue(client.GetAsync(url + value));
});

await Task.WhenAll(tasks.ToArray()).ConfigureAwait(false);

foreach (var task in tasks)
{
   // whatever 
}

That said, your use of Parallel.ForEach is quite suspect. You aren't performing anything of real significance inside the loop. Use of Parallel, especially with proper locking, likely has higher overhead negating any potential gains you claim to observe or are realized by paralellizing the Regex calls. I would convert this to a normal foreach loop and precompile the Regex to offset (some of) its overhead:

// in class
private static readonly Regex SpaceRegex = new Regex(@"\s+", RegexOptions.Compiled);

// in method
var tasks = new List<Task<string>>();

foreach (var number in numbers)
{
    var value = SpaceRegex.Replace(number, "%20");
    tasks.Add(client.GetAsync(url + value));
}

await Task.WhenAll(tasks).ConfigureAwait(false);

foreach (var task in tasks)
{
   // whatever 
}

Alternatively, don't use a regex at all. Use a proper Uri escaping mechanism which will have the added benefit of fixing more than just spaces:

var value = Uri.EscapeDataString(number);
// or
var fullUri = Uri.EscapeUriString(url + number);

Note there are two different methods there. The proper one to use depends on the values of url and number. There's also other mechanisms such as the HttpUtility.UrlEncode method... but I think these are the preferred ones.