Performance and Concept questions in LINQ

155 Views Asked by At

I have some questions / concerns using LINQ in my projects. First question is - Is there is difference in performance between old (select Item from..) linq and new version (.Select(r => ..))?

Second question, How LINQ expresions is being translated (and in what)? Will it be translated to old syntax first and then to something else (intermediate language)?

2

There are 2 best solutions below

3
On BEST ANSWER

There isn't any difference between the two ways we can write a linq query.

Specifically, this

var adults = from customer in customers
             where customer.Age>18
             select customer;

is equivalent to this:

var adults = customers.Where(customer=>customer.Age>18);

Actually, the compiler translates the first query to the second query. The first way of writing a linq query is something like a syntactic sugar. Under the hood, if you compile your code and then you make use of a dissasembler to see the IL code, you will notice that your query has been translated to the second one of the above forms.

Queries written with the first way, we say that we have used the query syntax. While queries written with the second way, we say that we have used the fluent syntax.

0
On

Is there is difference in performance between old (select Item from..) linq and new version (.Select(r => ..))?

Neither of these are older than the other, as both came into the language with at the same time. If anything .Select() could be argued as older as while the method call will almost always be a call to an extension method (and hence only available since .NET 3.5 and only callable that way with C# 3.0) there were method calls generally since 1.0.

There's no difference in performance, as they are different ways to say the same thing. (It's just about possible that you could find a case that resulted in a redundancy for one but not the other, but for the most part those redundancies are caught by the compiler and removed).

How LINQ expresions is being translated (and in what)? Will it be translated to old syntax first and then to something else (intermediate language)?

Consider that, as per the above, from item in someSource select item.ID and someSouce.Select(item => item.ID) are the same thing. The compiler has to do two things:

  1. Determine how the call should be made.
  2. Determine how the lambda should be used in that.

These two go hand in hand. The first part is the same as with any other method call:

  1. Look for a method defined on the type of someSource that is called Select() and takes one parameter of the appropriate type (I'll come to "appropriate type" in a minute).

  2. If no method is found, look for a method defined on the immediate base of the type of someSource, and so on until you have no more base classes to examine (after reaching object).

  3. If no method is found, look for an extension method defined on a static class that is available to use through a using which has its first (this) parameter the type of someSource, and its second parameter of the appropriate type that I said I'll come back to in a minute.

  4. If no method is found, look for a generic extension method that can accept the types of someSource and the lambda as parameters.

  5. If no method is found, do the above two steps for the base types of someSource and interfaces it implements, continuing to further base types or interfaces those interfaces extend.

  6. If no method is found, raise a compiler error. Likewise, if any of the above steps found two or more equally applicable method in the same step raise a compiler error.

So far this is the same as how "".IsNormalized() calls the IsNormalized() method defined on string, "".GetHashCode() calls the GetHashCode() method defined on object (though a later step means the override defined on string is what is actually executed) and "".GetType() calls the GetType() method defined on object.

Indeed we can see this in the following:

public class WeirdSelect
{
  public int Select<T>(Func<WeirdSelect, T> ignored)
  {
    Console.WriteLine("Select‎ Was Called");
    return 2;
  }
}
void Main()
{
  int result = from whatever in new WeirdSelect() select whatever;
}

Here because WeirdSelect has its own applicable Select method, that is executed instead of one of the extension methods defined in Enumerable and Queryable.

Now, I hand-waved over "parameter of the appropriate type" above because the one complication that lambdas bring into this is that a lambda in C# code can be turned into either a delegate (in this case a Func<TSource, TResult> where TSource is the type of the lambdas parameter and TResult the type of the value it returns) or an expression (in this case a Expression<Func<TSource, TResult>>) in the produced CIL code.

As such, the method call resolution is looking for either a method that will accept a Func<TSource, TResult> (or a similar delegate) or one that will accept an Expression<Func<TSource, TResult>> (or a similar expression). If it finds both at the same stage in the search there will be a compiler error, hence the following will not work:

public class WeirdSelect
{
  public int Select<T>(Func<WeirdSelect, T> ignored)
  {
    Console.WriteLine("Select‎ Was Called");
    return 2;
  }
  public int Select<T>(Expression<Func<WeirdSelect, T>> ignored)
  {
    Console.WriteLine("Select‎ Was Called on expression");
    return 1;
  }
}
void Main()
{
  int result = from whatever in new WeirdSelect() select whatever;
}

Now, 99.999% of the time we are either using select with something that implements IQueryable<T> or something that implements IEnumerable<T>. If it implements IQueryable<T> then the method call resolution will find public static IQueryable<TResult> Select<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector) defined in Queryable and if it implements IEnumerable<T> it will find public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector) defined in Enumerable. It doesn't matter that IQueryable<T> derives from IEnumerable<T> because its method will be found in an earlier step in the process described above, before IEnumerable<T> is considered as a base interface.

Therefore 99.999% of the time there will be a call made to one of those two extension methods. In the IQueryable<T> case the lambda is turned into some code that produces an appropriate Expression which is then passed to the method (the query engine then able to turn that into whatever code is appropriate, e.g. creating appropriate SQL queries if its a database-backed query engine, or something else otherwise). In the IEnumerable<T> case the lamda is turned into an anonymous delegate which is passed to the method which works a bit like:

public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
  //Simplifying a few things, this code is to show the idea only
  foreach(var item in source)
    yield return selector(item);
}

To come back to your question:

Will it be translated to old syntax first and then to something else (intermediate language)?

You could think of the newer from item in source select… syntax as being "turned into" the older source.Select(…) syntax (but not really older since it depends on extension methods over 99% of the time) because it makes the method call a bit clearer, but really they amount to the same thing. In the CIL produced the differences depend on whether the call was a instance method or (as is almost always the case) an extension method and even more so on whether the lambda is used to produce an expression or a delegate.