Why does ToList() make a sorted order?

241 Views Asked by At

When I execute the following code:

using static System.Console;

var numbers = ParallelEnumerable.Range(0, 50);

WriteLine("\nNumbers divisible by 5 are:");
var divisibleBy5 = numbers
    .AsParallel()
    .WithExecutionMode(ParallelExecutionMode.ForceParallelism)
    .Where(x => x % 5 == 0);
   // .Where(x => x % 5 == 0)
   //.ToList();
foreach (var number in divisibleBy5)
{
    Write(number + "\t");
}

I can see the unordered result most of the time. Here is a sample output (Expected behavior):

Numbers divisible by 5 are:
0       15      30      40      5       20      35      45      10      25

Now I change the query a little bit (using ToList() now):

var divisibleBy5 = numbers
    .AsParallel()
    .WithExecutionMode(ParallelExecutionMode.ForceParallelism)
    //.Where(x => x % 5 == 0);
    .Where(x => x % 5 == 0)
   .ToList();

Upon running this update, I always see the sorted output:

Numbers divisible by 5 are:
0       5       10      15      20      25      30      35      40      45

Questions:

  1. Is this an expected behavior?

  2. If so, why? Do we have any supporting docs for this?

What am I missing here? Can you please share your thoughts?

2

There are 2 best solutions below

7
dbc On BEST ANSWER

The explanation for the behavior is as follows:

  1. The queries returned by ParallelEnumerable.Range(), AsParallel() and Where() are all of base type ParallelQuery<int>.

  2. Because of that, when you call the extension method ToList() you are actually invoking ParallelEnumerable.ToList<int>(this ParallelQuery<int>) rather than Enumerable.ToList().

  3. ParallelEnumerable.ToList<int>() has a special case for ordered parallel enumerables that preserves order. This is mentioned briefly in the documentation page Query Operators and Ordering:

    Some PLINQ query operators behave differently, depending on whether their source sequence is ordered or unordered. The following table lists these operators:

    Operator Result when the source sequence is ordered Result when the source sequence is unordered
    Range Not applicable (same default as AsParallel) Not applicable
    ToList Ordered results Unordered results
    Where Ordered results Unordered results

    Thus as long as the preceding parts of your query are ordered, Where() and ToList() are documented to preserve that order while GetEnumerable() is not.

    That being said, ParallelEnumerable.Range() is not documented to return an ordered parallel enumerable (the docs seem ambiguous to me), so we can't say for sure that your final list will be ordered. But we can explain why ToList() and foreach might produce different orderings.

To see how this works in practice, we can check the .NET 8 reference source and do some debugging. The source code for ToList() can be found here:

public static List<TSource> ToList<TSource>(this ParallelQuery<TSource> source)
{
    ArgumentNullException.ThrowIfNull(source);

    // Allocate a growable list (optionally passing the length as the initial size).
    List<TSource> list = new List<TSource>();
    IEnumerator<TSource> input;

    if (source is QueryOperator<TSource> asOperator)
    {
        if (asOperator.OrdinalIndexState == OrdinalIndexState.Indexable && asOperator.OutputOrdered)
        {
            return new List<TSource>(ToArray<TSource>(source));
        }

        // We will enumerate the list w/out pipelining.

        // @PERF: there are likely some cases, e.g. for very large data sets,
        //     where we want to use pipelining for this operation. It can reduce memory
        //     usage since, as we enumerate w/ pipelining off, we're already accumulating
        //     results into a buffer. As a matter of fact, there's probably a way we can
        //     just directly use that buffer below instead of creating a new list.

        input = asOperator.GetEnumerator(ParallelMergeOptions.FullyBuffered);
    }
    else
    {
        input = source.GetEnumerator();
    }

Debugging indicates that the concrete type of source is WhereQueryOperator<int> with OrdinalIndexState equal to OrdinalIndexState.Increasing when means that GetEnumerator(ParallelMergeOptions.FullyBuffered) rather than just GetEnumerator() is called. The source for these methods is here:

public override IEnumerator<TOutput> GetEnumerator()
{
    // Buffering is unspecified and  order preservation is not suppressed.
    return GetEnumerator(null, false);
}

public IEnumerator<TOutput> GetEnumerator(ParallelMergeOptions? mergeOptions)
{
    // Pass through the value supplied for pipelining, and do not suppress
    // order preservation by default.
    return GetEnumerator(mergeOptions, false);
}

internal virtual IEnumerator<TOutput> GetEnumerator(ParallelMergeOptions? mergeOptions, bool suppressOrderPreservation)
{
    // Return a dummy enumerator that will call back GetOpenedEnumerator() on 'this' QueryOperator
    // the first time the user calls MoveNext(). We do this to prevent executing the query if user
    // never calls MoveNext().
    return new QueryOpeningEnumerator<TOutput>(this, mergeOptions, suppressOrderPreservation);
}

I.e. the difference between the enumerator used by ToList() and the default enumerator used by foreach is the former passes ParallelMergeOptions.FullyBuffered, which is documented as follows:

Use a merge with full output buffers. The system will accumulate all of the results before making any of them available to the consumer of the query.

This seems to be the cause of the inconsistency you are seeing.

To confirm that the behavior you are seeing is specific to ParallelEnumerable.ToList(), I modified your query to cast to IEnumerable<int> before calling ToList():

IEnumerable<int> divisibleBy5Query = numbers // IEnumerable<int> is implemented by ParallelQuery<int>
    .WithExecutionMode(ParallelExecutionMode.ForceParallelism)
    .Where(x => x % 5 == 0);
var divisibleBy5 = divisibleBy5Query
    .ToList();

When I did so, this resulted in a randomly ordered list.

You can see a demo fiddle showing all my variations of your divisibleBy5 query here: https://dotnetfiddle.net/qBccva.

0
Theodor Zoulias On

If you want to receive the results in the original order, the correct thing to do is to use the AsOrdered PLINQ operator:

ParallelQuery<int> divisibleBy5 = numbers
    .AsParallel()
    .AsOrdered()
    .Where(x => x % 5 == 0);

The ToList PLINQ operator might produce an ordered result accidentally. This is not documented, and it is certainly not something you should rely on (assuming that your goal is to write a program that behaves correctly consistently).