Side effects executed less than expected (LINQ)

75 Views Asked by At

Here is a test using NUnit style:

[Test]
public void Test()
{
    var i = 0;
    var v = Enumerable.Repeat(0, 10).Select(x =>
    {
        i++;
        return x;
    }).Last();

    Assert.That(v, Is.EqualTo(0));
    Assert.That(i, Is.EqualTo(10));
}

Unexpectedly, it fails:

Message:
Expected: 10
But was:  1

It is a big surprise that the side effect to increase i only performed once, not ten times. So I try to replace those LINQ methods with my own intuitive/naive implementation:

private T MyLast<T>(IEnumerable<T> values)
{
    var enumerator = values.GetEnumerator();
    while (enumerator.MoveNext())
    {
    }
    return enumerator.Current;
}

private IEnumerable<T> MyRepeat<T>(T value, int count)
{
    for(var i = 0; i<count; ++i)
    {
        yield return value;
    }
}

I omit the changed code; but you can verify that, if the code use MyRepeat rather than Enumerable.Repeat, OR use MyLast rather than Enumerable.Last, the test is passing. So obviously both methods are implemented differently.

(The above were tested in .NET 6, but the original observation was in a piece of code using .NET Core 3.1)

So my question is: how LINQ implemented those methods in a way that causes such weird behavior?

2

There are 2 best solutions below

0
Theodor Zoulias On BEST ANSWER

The .NET Core LINQ implementation includes various optimizations for known types of enumerable sequences. So some operators like the Last and the ElementAt might use a shorter path to return the result, instead of enumerating the sequence element by element. For example queries for the List<T> can be optimized, because this collection offers indexed access to its elements, while the Queue<T> does not. Apparently sequences produced by the Enumerable.Repeat and the Enumerable.Range can be optimized too. Here is another example that reproduces your observations:

Test(new List<int>(Enumerable.Range(0, 10)));
Test(new Queue<int>(Enumerable.Range(0, 10)));
Test(Enumerable.Range(0, 10));

static void Test(IEnumerable<int> source)
{
    int iterations = 0;
    int result = source.Select(x => { iterations++; return x; }).ElementAt(5);
    Console.WriteLine($"{source}, result: {result}, Iterations: {iterations}");
}

Output

System.Collections.Generic.List`1[System.Int32], result: 5, Iterations: 1
System.Collections.Generic.Queue`1[System.Int32], result: 5, Iterations: 6
System.Linq.Enumerable+RangeIterator, result: 5, Iterations: 1

Online demo.

The performance optimizations for the Enumerable.Repeat are located in this source code file:

1
Earth Engine On

The other answer is useful, but it was not fully answering "how", rather it is about "why" (optimization).

I dig deeper into it and here is my findings.

  • All the methods Enumerable.Repeat, Enumerable.Select and Enumerable.Last are critical for LinQ to do this optimization. So replacing any of them will invalidate the optimization and make the test passing.
  • The Enumerable.Repeat function gives a type marked as suitable for optimization, and provided the final implementation.
  • The Enumerable.Select function checks the source type is suitable for optimization and if so, delegate the optimization to it. Its return type is also marked as suitable for optimization.
  • The Enumerable.Last function checks the source type, and if it is suitable for optimization, delegate to the optimization.

There are some easy ways to bypass this optimazation. For example, in the test case, if the only thing that matter is how many iterations has been done, we can write

[Test]
public void Test()
{
    var i = 0;
    var v = Enumerable.Repeat(0, 10).Select(x =>
    {
        i++;
        return x;
    }).Append(0).Last();

    Assert.That(v, Is.EqualTo(0));
    Assert.That(i, Is.EqualTo(10));
}

This will pass. If the Last element is indeed to be calculated in the Lambda and so can't be determined in advance, write

[Test]
public void Test()
{
    var i = 0;
    var v = Enumerable.Repeat(0, 10).Select(x =>
    {
        i++;
        return x + i;
    }).Concat(new int[] {}).Last();

    Assert.That(v, Is.EqualTo(10));
    Assert.That(i, Is.EqualTo(10));
}

Lesson: Think again when using side effects in LinQ

This issue proves that LinQ was not designed to work with lambdas that has side effects. So in general, we should NOT write LinQ lambdas that have side effects. Therefore this code pattern is automatically a bad taste of code, even when it might worked just fine in your code.