optimizing SQL counts

57 Views Asked by At

I have to select a list of Catalogs from one table, and perform counts in two other tables: Stores and Categories. The counters should show how many Stores and Categories are linked to each Catalog. I have managed to get the functionality I need using this SQL query:

 SELECT `catalog`.`id` AS `id`,
       `catalog`.`name` AS `name`,
       (
            SELECT COUNT(*)
              FROM `category`
              WHERE `category`.`catalog_id` = `catalog`.`id`
               AND `category`.`is_archive` = 0
               AND `category`.`company_id` = 2
       ) AS `category_count`,
       (
            SELECT COUNT(*)
              FROM `store`
              WHERE `store`.`catalog_id` = `catalog`.`id`
               AND `store`.`is_archive` = 0
               AND `store`.`company_id` = 2
       ) AS `store_count`
  FROM `catalog`
 WHERE `catalog`.`company_id` = 2
   AND `catalog`.`is_archive` = 0
 ORDER BY `catalog`.`id` ASC;

This works as expected. But I don't like to perform sub-queries, as they are slow and this query may perform badly on LARGE lists.. Is there any method of optimizing this SQL using JOINs? Thanks in advance.

2

There are 2 best solutions below

0
On BEST ANSWER

You can make this a lot faster by refactoring the dependent subqueries in your SELECT clause into, as you mention, JOINed aggregate subqueries.

The first subquery you can write this way.

                SELECT COUNT(*) num, catalog_id, company_id
                  FROM category
                 WHERE is_archive = 0
                 GROUP BY catalog_id, company_id

The second one like this.

                SELECT COUNT(*) num, catalog_id, company_id
                  FROM store
                 WHERE is_archive = 0
                 GROUP BY catalog_id, company_id

Then, use those in your main query aas if they were tables containing the counts you want.

SELECT catalog.id,
       catalog.name,
       category.num category_count,
       store.num store_count
  FROM catalog
  LEFT JOIN (
            SELECT COUNT(*) num, catalog_id, company_id
              FROM category
             WHERE is_archive = 0
             GROUP BY catalog_id, company_id
       ) category  ON catalog.id = category.catalog_id
                  AND catalog.company_id = category.company_id
  LEFT JOIN (
            SELECT COUNT(*) num, catalog_id, company_id
              FROM store
             WHERE is_archive = 0
             GROUP BY catalog_id, company_id
       ) store  ON catalog.id = store.catalog_id
               AND catalog.company_id = store.company_id
 WHERE catalog.is_archive = 0
   AND catalog.company_id = 2
 ORDER BY catalog.id ASC;

This is faster than your example because each subquery need only run once, rather than once per catalog entry. It also has the nice feature that you only need say WHERE catalog.company_id = 2 once. The MySQL optimizer knows what to do with that.

I suggest LEFT JOIN operations so you'll still see catalog entries even if they're not mentioned in your category or store tables.

2
On

Subqueries are fine, but you can simplify your query:

SELECT c.id, c.name,
       COUNT(*) OVER (PARTITION BY c.catalog_id) as category_count,
       (SELECT COUNT(*)
        FROM store s
        WHERE s.catalog_id = s.id AND
              s.is_archive = 0 AND
              s.company_id = c.company_id
       ) AS store_count
FROM catalog c
WHERE c.company_id = 2 AND c.is_archive = 0
ORDER BY c.id ASC;

For performance, you want indexes on:

  • catalog(company_id, is_archive, id)
  • store(catalog_id, company_id, is_archive)

Because of the filtering in the outer query, a correlated subquery is probably the best performing way to get the results from store.

Also note some changes to the query:

  • I removed the backticks. They are unnecessary and just clutter the query.
  • An expression like c.id as id is redundant. The expression is given id as the alias anyway.
  • I changed the s.company_id = 2 to s.company_id = c.company_id. It seems like a correlation clause.