SQL Linq with sub-entities

152 Views Asked by At

Let's say you have tables Customer and Purchases and you want to return a DTO of all customers from the city of Knuffingen and their purchaes in 2023.

var customers = context.Customer
  .Where(c => c.City = "Knuffingen")
  .ToList();
foreach (var customer in customers)
{
  var purchases = customer.Purchases
    .Where(p => p.Year == 2023)
    .ToList();
  yield return new DTO(customer, purchases);
}

would obviously a bad idea because every loop iteration would query a new request against the database.

Currently I'm doing something like this:

var customers = context.Customer
  .Where(c => c.City = "Knuffingen")
  .ToList();
var customerIds = customer
  .Select(c => c.Id)
  .ToList();
var purchases = context.Purchases
  .Where(p => customerIds.Contains(p.CustomerId)
    && p.Year == 2023)
  .ToList();
foreach (var customer in customers)
{
  var customerPurchases = purchases
    .Where(p => p.CustomerId == customer.Id)
    .ToList()
  yield return new DTO(customer, customerPurchases);
}

but that needs to take into consideration the max amount of query parameters and if necessary split up the query into multiple segments.

I was wondering if there was a better way, for example like this:

var customerQueryable = context.Customers
  .Where(c => c.City == "Knuffingen");
var customers = customerQueryable
  .ToList();
var purchases = customerQueryable
  .SelectMany(c => c.Purchases
    .Where(p => p.Year == 2023))
  .ToList();
foreach (var customer in customers)
{
  var customerPurchases = purchases
    .Where(p => p.CustomerId == customer.Id)
    .ToList()
  yield return new DTO(customer, customerPurchases);
}

but if you have more things than just purchases dependent on the Customer entity and your initial filter is a bit more complex I'm not sure it wouldn't put a bigger strain on the database to query all the customers over and over and over again.

1

There are 1 best solutions below

3
On

You are right, querying customers over and over is performance heavy and unnecessary.

What you achieved with your 2nd and 3rd example is a split query. You can do that better if you go with your 1st example and change only the execution (from lazy loading to eager) with either:

  • Include() - loads related table data in left join manner,
  • filtered include (good but sensitive),
  • projection.

And you need to fix the where clause to check if the Customers also purchased in 2023 and not just live in Knuffingen:

var matchingCustomers = context.Customers
    .Where(c => c.City == "Knuffingen" && c.Purchases.Any(p => p.Year == 2023))

Solution with .Include:

The actual number of database roundtrips (query executions) will depend on QuerySplittingBehavior. A join sounds very memory heavy because of cartesian explosion but using AsSplitQuery() makes it perform good while keeps the code clean and readable.

var matchingCustomers = context.Customers
    .Where(c => c.City == "Knuffingen" && c.Purchases.Any(p => p.Year == 2023))
    .Include(c => c.Purchases)
    .AsSplitQuery(); // +1 roundtrip per Include but no cartesian-explosion

foreach (var c in matchingCustomers)
{
    var purchases = c.Purchases
        .Where(p => p.Year == 2023)
        .ToList(); // no roundtrip to DB
    yield return new DTO(c, purchases);
}

To reduce iterations, you can skip ToList() (See IQueryable streaming explanation) and continue directly with foreach or replace the foreach with a select:

    .Select(c => new DTO(c, c.Purchases.Where(p => p.Year == 2023).ToList()));

Solution with Projection:

Both above (foreach/select) will select all columns of the filtered Customers and all columns of their Purchases in order to provide all necessary data for the DTO constructor. EF preprocessors cannot assume which property is really consumed by the constructor see examples.

If you checked the examples of the link above, you already know how to fasten up the query: turn your select into a projection:

Expression<Func<Purchase, bool>> purchasedIn2023 = p => p.Year == 2023;

return context.Customers
    .Where(c => c.City == "Knuffingen" && c.Purchases.Any(purchasedIn2023))
    .AsSplitQuery() // should work the same: +1 roundtrip per Include (not per customer) but please test what query is generated
    .Select(c => new DTO()
    {
        CustomerName = c.Name,
        // projections ignores includes, all (customer related) purchases can be accessed here
        TotalCost = c.Purchases.Where(purchasedIn2023).Sum(p => p.CostAmount)
    });

With this, only the projected columns are retrieved from the database (c.Name, p.CostAmount).

I have to mention: Not every Linq function can be transpiled, e.g. Average() will work, string.Join() will not. Check supported aggregate methods.


Solution with filtered include:

Since we cannot use projection and filtered include together, we are back to having a list and retrieving all columns of Customer and Purchase from DB. The bright side: this only loads the necessary Purchases. Choose this when you need most columns from both tables but very few Purchases of each Customer, otherwise use projection.

var matchingCustomers = context.Customers
    .Where(c => c.City == "Knuffingen" && c.Purchases.Any(purchasedIn2023))
    .Include(c => c.Purchases.Where(purchasedIn2023))
    .AsSplitQuery()
    .ToList();

foreach (var c in matchingCustomers)
{
    // c.Purchases contains only the ones made in 2023 IF no EF fix-ups happened (no purchases has been loaded earlier - can be achieved with transient service call)
    // change the constructor to accept ICollection instead of List
    yield return new DTO(c, c.Purchases);
}

You can use further filtering, for example take the 5 most expensive purchase only:

    .Include(c => c.Purchases.Where(purchasedIn2023).OrderByDescending(p => p.CostAmount).Take(5))

As for your 2nd question

"what happens if there is more dependency to finding Customers than just Purchases" I see 2 approaches:

  1. You have only 1 or 2 customer filtering logic, so you decide to keep each logic inside a separate function and you modify the Where clause, extend the Include or Select as necessary.

  2. You have a lot of different filtering logic, so you decide to introduce Predicate (filter expression) to your function.

     IEnumerable<DTO> GetCustomers(Expression<Func<Customer, Customer>> filter)
     {
         return context.Customers
             .Where(filter)
             .AsSplitQuery()
             .Select(c => new DTO()
             {
                 CustomerName = c.Name,
                 // I changed the select to be independent from the predicate
                 LastOrder = c.Purchases.OrderByDescending(p => p.Id).First()
             });
     }
    
     // call like:
     GetCustomers(c => c.City == "Knuffingen" && c.Purchases.Any(p => p.Year == 2023))
    

    And this can go very deep in terms of reusability:

     IEnumerable<DTO> GetCustomers(
         Expression<Func<Customer, Customer>> filter,
         Expression<Func<Customer, DTO>> selector)
     {
         return context.Customers
             .Where(filter)
             .AsSplitQuery()
             .Select(selector);
             // no need to call AsEnumerable(), it is cast implicitly
             // however return ICollection or List if the result iterated more than once!
     }
    
     ICollection<DTO> GetCustomers(
         Expression<Func<Customer, Customer>> filter,
         Func<IQueryable<Customer>, IQueryable<Customer>> include)
     {
         var customers = include(context.Customers)
             .Where(filter)
             .AsSplitQuery()
             .ToList();
     }
    
     // call like:
     GetCustomers(
         filter: c => c.City == "Knuffingen" && c.Purchases.Any(p => p.Year == 2023)),
         selector: c => new DTO() { CustomerName = c.Name, TotalCost = c.Purchases.Sum(p => p.CostAmount) });
    
     GetCustomers(
         filter: c => c.City == "Knuffingen" && c.Purchases.Any(p => p.Year == 2023)),
         include: c => c.Include(c => c.Purchases.Where(p => p.Year == 2023))
                      // possible future relation
                      //.Include(c => c.PersonalInfo.Where(i => i.BornYear >= 2005))
     );