Why is LINQ non-deterministic?

683 Views Asked by At

I randomly sorted an IEnumerable. I keep printing out the same element, and getting a different result.

string[] collection = {"Zero", "One", "Two", "Three", "Four"};
var random = new Random();
var enumerableCollection = collection.OrderBy(e => random.NextDouble());

Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));

Every write gives a different random element. Why is the order not preserved?

See it on .NET Fiddle

4

There are 4 best solutions below

1
On

Many of the previous responses are technically correct, but I think it is useful to look directly at the implementation for Enumerable.

We can see that the the ElementAt calls GetEnumerator, which in turn yields on an iteration of the the current grouping.

If you aren't familiar with yeild see yield.

For the given example, the current grouping is

e => random.NextDouble()

Which means that we are endlessly iteratorating over collection (look at the inifinite loop of the yield) and returning the resolution of the grouping for each element which executes against random.NextDouble().

Therefore, the non-deterministic nature here is due to the non-deterministic nature of the random grouping. This is expected behavior for the LINQ terminal statements (.List(), toArray() ect.) but can be confusing if you attempt to utilize them beforehand.

0
On

Linq defers execution until it is absolutely necessary. You can think of enumerableCollection as the definition of how you want the enumeration to work, rather than the result of a single enumeration.

So each time you enumerate it (which you're doing when you call ElementAt) it will re-enumerate your original collection, and since you've chosen to order randomly, the answer is different each time.

You can make it do what you were expecting by adding .ToList() on the end:

var enumerableCollection = collection.OrderBy(e => random.NextDouble()).ToList();

This performs an enumeration, and stores the result in a List. By using this, the original list won't be re-enumerated each time you enumerate the enumerableCollection.

0
On

The simple answer is that you are specifically forcing a non-deterministic situation. The why is due to how LINQ functions.

At its core LINQ is designed around the concept of building objects that represent operations on a set of data - object, records, etc. - and then executing those operations at enumeration time. When you call OrderBy the return is an object that will process the origin to perform the ordering, not a new collection that is ordered. The order is Only when you actually enumerate the data - by calling ElementAt in this case.

Each time you enumerate the IEnumerable<string> object it runs the sequence of operations you specified, creating a random ordering of the elements. Each ElementAt(0) call will therefore return the first element of a new random sequence.

Essentially what you appear to be misunderstanding is this: an IEnumerable<T> is not a collection.

You can however convert the IEnumerable<string> to a collection using ToArray() or ToList(). These will take the output of the operation sequence specified by the IEnumerable<> and create a new collection - string[] or List<string> respectively.

For instance:

string[] collection = {"Zero", "One", "Two", "Three", "Four"};
var random = new Random();
var enumerableCollection = collection.OrderBy(e => random.NextDouble()).ToArray();

Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));

Now you'll get a random item from the original array, but it will be the same item multiple times. Instead of randomizing each time it's just reading the first item in a new collection whose order is randomized.

2
On

I am glad the question is here. While the concept of deferred execution is well known, people still make the mistake and enumerate IEnumerable multiple times, even Eric Lippert did not avoid that in https://ericlippert.com/2014/10/20/producing-combinations-part-three/ . Doing so may and eventually will break your code. If you need to enumerate IEnumerable multiple times, materialize it (for example with ToList).

Edit

with all respect to authority of Eric Lippert, let me show how bad multiple enumeration can be. Here is the correct result, based on list:

Zero,One,Two
Zero,One,Three
Zero,One,Four
Zero,Two,Three
Zero,Two,Four
Zero,Three,Four
One,Two,Three
One,Two,Four
One,Three,Four
Two,Three,Four

and here is what happens when we process sequence with unstable order:

Three,One,Four
Four,Two,One
One,Zero,Three
One,Zero,Four
Three,Two,Zero
One,Four,Zero
Two,Four,One
Two,Three,Four
Two,Three,Four
Four,Three,One

and to the points

  1. one of the enumerations was to count the sequence; a sequence that does not have a stable count when enumerated multiple times is a sequence that you should not attempt to make combinations out of!
  2. counting does not necessarily enumerate the sequence.
  3. in that series I strongly encourage the use of immutable data structures which are always safe and performant when enumerated multiple times.
  1. I agree that trying to make combinations of sequence with unstable count is questionable. But it is not the case here. The sequence has given count and guaranteed items, they are just randomly ordered. Imagine it can be result from some fancy HashSet that can rearrange its internal structure between enumerations for optimization reasons.
  2. That is true. That possibility is there. But is that really an argument?
  3. If that is a contract, ok. You expect sequence that you can enumerate as many times as you wish and always get the same result. Stating that explicitely would avoid confusion, not every sequence has this property.