MySQL Group By and Count Performance

54 Views Asked by At

I have two tables

  1. Export
Name     Type           Collation        Attributes    Null   Default   Extra
id       int(10)        utf8_unicode_ci  UNSIGNED      No     None      AUTO_INCREMENT
email    varchar(150)   utf8_unicode_ci                No     None  
city_id int(11)         utf8_unicode_ci                Yes    NULL      

Indexes

              Type   Unique Packed  Column  Cardinality Collation     Null
id            BTREE  Yes        No      id      769169      A           No
email_index   BTREE  Yes        No     email    769169      A           No
city_id_index BTREE  No         No  city_id.      6356      A          Yes
  1. Export history
Name     Type           Collation        Attributes    Null   Default   Extra
id       int(10)        utf8_unicode_ci  UNSIGNED      No     None      AUTO_INCREMENT
email    varchar(255)   utf8_unicode_ci                No     None      

Indexes

            Type    Unique  Packed  Column  Cardinality Collation   Null
id          BTREE   Yes     No      id      113887      A           No
email_index BTREE   No      No      email   113887      A           No  

I need to get a top city ids which have the most emails (users). Also there is export_history table. I need to exclude emails from the results.

The end query looks like

Main query

SELECT COUNT(city_id) as city_count, city_id
    FROM export e
        WHERE NOT EXISTS (
            SELECT * FROM export_history ehistory
                WHERE e.email = ehistory.email
            ) 
        GROUP BY city_id
            ORDER BY city_count DESC
                   LIMIT 5

Exec time is ~7 sec. The problem is that it takes about that much to execute.

Explain shows:

id select_type       table     type   possible_keys  key            key_len  ref     rows    Extra
1 PRIMARY            e         index  NULL           city_id_index  5        NULL    769169  Using where; Using temporary; Using filesort
2 DEPENDENT SUBQUERY ehistory  ref    email_index    email_index    767      e.email 1     Using where; Using index

Please take into account that these two queries work very fast > 0.01 sec

Query 1

SELECT COUNT(city_id) as city_count, city_id
    FROM export
        GROUP BY city_id
            ORDER BY city_count DESC
                   LIMIT 5

Exec time is ~0.1 sec

Query 2

SELECT *
    FROM export e
        WHERE NOT EXISTS (
            SELECT * FROM export_history ehistory
                WHERE e.email = ehistory.email
            ) 

Exec time is ~0.02 sec

Could you please recommend any suggestions to improve performance of the main query?

1

There are 1 best solutions below

1
On

You can simplify your query by using LEFT JOIN ... IS NULL rather than NOT EXISTS with your dependent subquery. It may (or may not: try it) speed things up for you by avoiding repetition of the dependent subquery.

SELECT COUNT(e.city_id) as city_count, e.city_id
  FROM export e
  LEFT JOIN export_history ehistory ON e.email = ehistory.email
 WHERE ehistory.id IS NULL
 GROUP BY e.city_id
 ORDER BY COUNT(e.city_id) DESC
 LIMIT 5;

Try this compound index.

CREATE INDEX exp_email_cityid ON export(email, city_id);

and if that doesn't help try an index with the columns in the opposite order:

CREATE INDEX exp_cityid_email ON export(city_id, email);

Pro tip: Single-column indexes don't do the same thing as multi-column indexes created to match the filtering criteria in your queries.