Why is EF Core parametrising with incorrect data types?

462 Views Asked by At

I've been playing with .NET 8 RC2 and I have not come across this before in EF Core.

I have created a dynamic query and part of the query is this:

_ordersQuery = _ordersQuery.Where(x => !stationIds.Contains(x.StationNo));

stationIds is of type List<short>.

When I do .ToQueryString() on the query I notice this:

DECLARE @__stationIds_7 nvarchar(4000) = N'[117,116,98]';

then in the actual WHERE part of the SQL query I see this

NOT IN (
   SELECT [s0].[value]
   FROM OPENJSON(@__stationIds_7) WITH ([value] smallint '$') AS [s0]

Why is it being treated as JSON?

3

There are 3 best solutions below

2
On BEST ANSWER

That's a smart idea! The type is definitely not wrong. Instead of generating a hard-coded IN, the list is passed as a JSON array, deserialized with OPENJSON (a very fast method) and used with NOT IN.

Until now, list.Contains or !list.Contains were transformed to hard-coded SQL clauses IN (@id1, @id2, ....) or NOT IN (...). The number of items had to be known in advance. Now, while the server can take advantage of the statistics of the IN (...) clause to calculate how many matches there may be, a NOT IN (..) is almost useless. If there are 1M IDs and 100 items in NOT IN, the server will still have to scan the entire table for the other 9999000 matching items.

Another concern is that a varying number of parameters in IN (...) would result in multiple execution plans, not only filling the cache with single-use plans but evicting other useful plans

This was announced in EF Core 8 Preview 4, in the Translating LINQ Contains with a parameter collection section and addressed a pretty serious performance issue, especially for long-lived web sites (emphasis mine):

But crucially, the negative performance impact of constantly varying SQLs goes beyond this particular query. SQL Server (and Npgsql) can only cache a certain number of SQLs; at some point, they have to get rid of old entries to avoid using too much memory. If you frequently use Contains with a variable array, each individual invocation causes valuable cache entries to be taken at the database, for SQLs that will most probably never be used (since they have the specific array values baked in). That means you’re also evicting cache entries for other, important SQLs that will need to be used, and requiring them to be re-planned again and again.

As the docs say, this was the second highest voted performance issue in the EF Core repo.

1
On
NOT IN (
   SELECT [s0].[value]
   FROM OPENJSON(@__stationIds_7) WITH ([value] smallint '$') AS [s0]

.NET has made some improvements to its SQL generation, I'm assuming that one of them would be how to handle lookups in lists.

From your post, it looks like it's passing the list of integer values a json list.

"FROM OPENJSON()" is a json parsing function within SQL Server, "WITH ()" instructs sql what the data structure within the json looks like, the '$' points to the root object.

So from the above sql string, it parses the json '[117,116,98]', instructing SQL that the root object (each record within that list) to be of type smallint.

Not sure if this helps :)

1
On

TL;DR

This is translation change introduced in EF Core 8 to tackle performance problems in MS SQL Server provider, which had lead to query plan caching problems.

Note that it can break the query execution if you have older SQL Server or compatibility level set for database. In this case for MSSQL you can run:

SELECT compatibility_level 
FROM sys.databases 
WHERE name = 'MyDBName';

And either upgrade the database or use the value when setting up the DbContext options:

optionsBuilder
        .UseSqlServer(@"<CONNECTION STRING>", o => o.UseCompatibilityLevel(thValue));

Details

Checkout breaking changes for EF Core 8, particularly the Contains in LINQ queries may stop working on older SQL Server versions:

Old behavior Previously, when the Contains operator was used in LINQ queries with a parameterized value list, EF generated SQL that was inefficient but worked on all SQL Server versions.

New behavior Starting with EF Core 8.0, EF now generates SQL that is more efficient, but is unsupported on SQL Server 2014 and below.

Note that newer SQL Server versions may be configured with an older compatibility level, also making them incompatible with the new SQL. This can also occur with an Azure SQL database which was migrated from a previous on-premises SQL Server instance, carrying over the old compatibility level.

And WHY:

The previous SQL generated by EF Core for Contains inserted the parameterized values as constants in the SQL. For example, the following LINQ query:

var names = new[] { "Blog1", "Blog2" };

var blogs = await context.Blogs
    .Where(b => names.Contains(b.Name))
    .ToArrayAsync();

would be translated to the following SQL:

SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] IN (N'Blog1', N'Blog2')

Such insertion of constant values into the SQL creates many performance problems, defeating query plan caching and causing unneeded evictions of other queries. The new EF Core 8.0 translation uses the SQL Server OPENJSON function to instead transfer the values as a JSON array. This solves the performance issues inherent in the previous technique; however, the OPENJSON function is unavailable in SQL Server 2014 and below.