C#: AsParallel - does order matter?

4.1k Views Asked by At

I'm building a simple LinQ-to-object query which I'd like to parallelize, however I'm wondering if the order of statements matter ?

e.g.

IList<RepeaterItem> items;

var result = items
        .Select(item => item.FindControl("somecontrol"))
        .Where(ctrl => SomeCheck(ctrl))
        .AsParallel();

vs.

var result = items
        .AsParallel()
        .Select(item => item.FindControl("somecontrol"))
        .Where(ctrl => SomeCheck(ctrl));

Would there be any difference ?

2

There are 2 best solutions below

4
On BEST ANSWER

Absolutely. In the first case, the projection and filtering will be done in series, and only then will anything be parallelized.

In the second case, both the projection and filtering will happen in parallel.

Unless you have a particular reason to use the first version (e.g. the projection has thread affinity, or some other oddness) you should use the second.

EDIT: Here's some test code. Flawed as many benchmarks are, but the results are reasonably conclusive:

using System;
using System.Diagnostics;
using System.Linq;
using System.Threading;

class Test
{
    static void Main()
    {
        var query = Enumerable.Range(0, 1000)
                              .Select(SlowProjection)
                              .Where(x => x > 10)
                              .AsParallel();
        Stopwatch sw = Stopwatch.StartNew();
        int count = query.Count();
        sw.Stop();
        Console.WriteLine("Count: {0} in {1}ms", count,
                          sw.ElapsedMilliseconds);

        query = Enumerable.Range(0, 1000)
                          .AsParallel()
                          .Select(SlowProjection)
                          .Where(x => x > 10);
        sw = Stopwatch.StartNew();
        count = query.Count();
        sw.Stop();
        Console.WriteLine("Count: {0} in {1}ms", count,
                          sw.ElapsedMilliseconds);
    }

    static int SlowProjection(int input)
    {
        Thread.Sleep(100);
        return input;
    }
}

Results:

Count: 989 in 100183ms
Count: 989 in 13626ms

Now there's a lot of heuristic stuff going on in PFX, but it's pretty obvious that the first result hasn't been parallelized at all, whereas the second has.

2
On

It does matter and not just in performance. The result of the first and the second queries are not equal. There is solution to have parallel processing and keeping the original order. Use AsParallel().AsOrdered(). Third query shows it.

var SlowProjection = new Func<int, int>((input) => { Thread.Sleep(100); return input; });

var Measure = new Action<string, Func<List<int>>>((title, measure) =>
{
    Stopwatch sw = Stopwatch.StartNew();
    var result = measure();
    sw.Stop();
    Console.Write("{0} Time: {1}, Result: ", title, sw.ElapsedMilliseconds);
    foreach (var entry in result) Console.Write(entry + " ");         
});

Measure("Sequential", () => Enumerable.Range(0, 30)
    .Select(SlowProjection).Where(x => x > 10).ToList());
Measure("Parallel", () => Enumerable.Range(0, 30).AsParallel()
    .Select(SlowProjection).Where(x => x > 10).ToList());
Measure("Ordered", () => Enumerable.Range(0, 30).AsParallel().AsOrdered()
    .Select(SlowProjection).Where(x => x > 10).ToList());

Result:

Sequential Time: 6699, Result: 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Parallel Time: 1462, Result: 12 16 22 25 29 14 17 21 24 11 15 18 23 26 13 19 20 27 28
Ordered Time: 1357, Result: 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

I was surprised about that, but the result was consistent after 10+ test run. I investigated a bit and it turned out to be a "bug" in .Net 4.0. In 4.5 AsParallel() is not slower than AsParallel().AsOrdered()

Reference is here:

http://msdn.microsoft.com/en-us/library/dd460677(v=vs.110).aspx