MySQL: Is there any way to optimize multiple table ORDER BY with following DB structure?

48 Views Asked by At

I have two tables with following structure and indexes (real names are changed for purpose):

mysql> describe table1;
+---------+---------------------+------+-----+-------------------+----------------+
| Field   | Type                | Null | Key | Default           | Extra          |
+---------+---------------------+------+-----+-------------------+----------------+
| id      | bigint(20) unsigned | NO   | PRI | NULL              | auto_increment |
| field1  | varchar(255)        | NO   | UNI | NULL              |                |
| date    | timestamp           | NO   | MUL | CURRENT_TIMESTAMP |                |
| text    | varchar(10000)      | NO   |     | NULL              |                |
| flag    | tinyint(1)          | YES  |     | 0                 |                |
+---------+---------------------+------+-----+-------------------+----------------+

mysql> show indexes from table1;
+--------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name             | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table1 |          0 | PRIMARY              |            1 | id          | A         |     1420047 |     NULL | NULL   |      | BTREE      |         |               |
| table1 |          0 | table1_field1_unique |            1 | field1      | A         |     1420047 |     NULL | NULL   |      | BTREE      |         |               |
| table1 |          1 | table1_date_idx      |            1 | date        | A         |     1420047 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

mysql> describe table2;
+------------------------+---------------------+------+-----+---------+----------------+
| Field                   | Type                | Null | Key | Default | Extra          |
+------------------------+---------------------+------+-----+---------+----------------+
| id                      | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| table1_id               | bigint(20) unsigned | NO   | MUL | NULL    |                |
| some1_id                | bigint(20) unsigned | YES  | MUL | NULL    |                |
| some1_name              | varchar(255)        | YES  | MUL | NULL    |                |
| some2_id                | bigint(20) unsigned | NO   | MUL | NULL    |                |
| some2_name              | varchar(255)        | NO   | MUL | NULL    |                |
| some3_name              | varchar(255)        | YES  | MUL | NULL    |                |
| some4_email             | varchar(255)        | YES  |     | NULL    |                |
| some4_name              | varchar(255)        | YES  | MUL | NULL    |                |
| some4_place1_gift       | varchar(255)        | YES  |     | NULL    |                |
| some4_place2_gift       | varchar(255)        | YES  |     | NULL    |                |
| some4_place3_gift       | varchar(255)        | YES  |     | NULL    |                |
+------------------------+---------------------+------+-----+---------+----------------+

mysql> show indexes from table2;
+--------------+------------+--------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name                 | Seq_in_index | Column_name| Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table2|          0 | PRIMARY                         |            1 | id         | A         |      462911 |     NULL | NULL   |      | BTREE      |         |               |
| table2|          1 | table2_table1_table1_id_foreign |            1 | table1_id  | A         |      462911 |     NULL | NULL   |      | BTREE      |         |               |
| table2|          1 | some4_name_idx                  |            1 | some4_name | A         |        5645 |     NULL | NULL   | YES  | BTREE      |         |               |
| table2|          1 | some3_name_idx                  |            1 | some3_name | A         |        3560 |     NULL | NULL   | YES  | BTREE      |         |               |
| table2|          1 | some2_id_idx                    |            1 | some2_id   | A         |         116 |     NULL | NULL   |      | BTREE      |         |               |
| table2|          1 | some1_id_idx                    |            1 | some1_id   | A         |         390 |     NULL | NULL   | YES  | BTREE      |         |               |
| table2|          1 | some1_name_idx                  |            1 | some1_name | A         |        1727 |     NULL | NULL   | YES  | BTREE      |         |               |
| table2|          1 | some2_name_idx                  |            1 | some2_name | A         |         221 |     NULL | NULL   |      | BTREE      |         |               |
+--------------+------------+--------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

I have the following query:

SELECT
    table1.id AS table1_id,
    table1.field1,
    table1.date,
    table1.text,
    table2.id AS table2_id,
    table2.some1_id,
    table2.some1_name,
    table2.some2_id,
    table2.some2_name,
    table2.some3_name,
    table2.some4_email,
    table2.some4_name,
    table2.some4_place1_gift,
    table2.some4_place2_gift,
    table2.some4_place3_gift
FROM
    table2
INNER JOIN
    table1 ON table2.table1_id = table1.id
WHERE
    table2.some1_name = 'Some1_Name_Example'
    AND table2.some2_name = 'Some2_Name_Example'
    AND table2.some3_name = 'Some3_Name_Example'
    AND (
        table2.some2_id IN (1, 22, 975, 5981, 6127, 10861)
        OR table2.some1_id IN (2564, 4886, 12514, 12724, 13905, 15491, 16295, 18125, 22162, 20702, 20704, 21503)
        OR (table2.some2_id = 1277 AND table1.date < '2022-03-27 00:00:00')
        OR (table2.some2_id = 1678 AND table1.date < '2023-06-16 00:00:00')
        OR (table2.some2_id = 9519 AND table1.date < '2021-01-05 00:00:00')
        OR (table2.some1_id = 4648 AND table1.date < '2023-06-16 00:00:00')
    )
ORDER BY
    table1.date DESC,
    table2.id DESC
LIMIT 200;

Query time: 1.04 sec. This is EXPLAIN statement for this query:

+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+----------------------------+--------+---------------------------------------------------------------------+
| id | select_type | table        | type   | possible_keys                                                                                       | key             | key_len | ref                  | rows   | Extra                                                               |
+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+----------------------------+--------+---------------------------------------------------------------------+
|  1 | SIMPLE      | table2       | ref    | table2_table1_id_foreign,some3_name_idx,some2_id_idx,some1_id_idx,some1_name_index,some2_name_index | some3_name_idx  | 768     | const                | 231455 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | events       | eq_ref | PRIMARY,table1_date_idx                                                                             | PRIMARY         | 8       | db.table2.table1_id  |      1 | Using where                                                         |
+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+----------------------------+--------+---------------------------------------------------------------------+

Then I remove ORDER BY table1.date DESC clause (leaving just ORDER BY table2.id DESC).

Query time: 0.12 sec. EXPLAIN statement:

+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+--
| id | select_type | table        | type   | possible_keys                                                                                       | key             | key_len | ref                  | rows   | Extra        |
+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+--
|  1 | SIMPLE      | table2       | ref    | table2_table1_id_foreign,some3_name_idx,some2_id_idx,some1_id_idx,some1_name_index,some2_name_index | some3_name_idx  | 768     | const                | 231455 | Using where  |
|  1 | SIMPLE      | events       | eq_ref | PRIMARY,table1_date_idx                                                                             | PRIMARY         | 8       | db.table2.table1_id  |      1 | Using where  |
+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+---

Looks like ORDER BY table1.date DESC creates massive overhead in query execution. I am not sure if anything can be done in such case to optimize a query. Any suggestions?

P.S. I know that WHERE clauses look strange, but my main question in ORDER BY optimization.

1

There are 1 best solutions below

1
On BEST ANSWER

You have two performance problems with this scenario.

  1. SELECT whole-mess-of-rows ... ORDER BY something LIMIT small-number is a well-known performance antipattern. To satisfy the query the database must fetch a lot of data, sort it in a particular order, then discard all but a tiny fraction of it. It's sometimes possible to use an index to avoid the sorting task.

    This is why leaving out the ORDER BY clause makes your query faster. It just returns the first 200 rows, whichever rows were generated first. But those rows won't be very useful, because their order is unpredictable. You can't reasonably omit ORDER BY.

  2. A covering index will help the WHERE filters on this query. Try this index:

    CREATE INDEX names ON table2
       (some1_name, some2_name, some_3_name, some2_id, some1_id)
    

    It should make things a bit faster.

  3. If you're still strugging with performance you can try a so-called deferred join. First use a subquery to get the id values of the rows you need. Like this.

     SELECT
         table1.id AS table1_id,
         table2.id AS table2_id
     FROM
         table2
     INNER JOIN
         table1 ON table2.table1_id = table1.id
     WHERE
         table2.some1_name = 'Some1_Name_Example'
         AND table2.some2_name = 'Some2_Name_Example'
         AND table2.some3_name = 'Some3_Name_Example'
         AND (
             table2.some2_id IN (1, 22, 975, 5981, 6127, 10861)
             OR table2.some1_id IN (2564, 4886, 12514, 12724, 13905, 15491, 16295, 18125, 22162, 20702, 20704, 21503)
             OR (table2.some2_id = 1277 AND table1.date < '2022-03-27 00:00:00')
             OR (table2.some2_id = 1678 AND table1.date < '2023-06-16 00:00:00')
             OR (table2.some2_id = 9519 AND table1.date < '2021-01-05 00:00:00')
             OR (table2.some1_id = 4648 AND table1.date < '2023-06-16 00:00:00')
         )
     ORDER BY
         table1.date DESC,
         table2.id DESC
     LIMIT 200;
    

    That has the same sort-then-discard problem as your original query, but each row that gets sorted is far far smaller -- just a couple of id values and a `date.

    Then join it to your tables to get the details (I didn't debug this).

     SELECT
         table1.id AS table1_id,
         table1.field1,
         table1.date,
         table1.text,
         table2.id AS table2_id,
         table2.some1_id,
         table2.some1_name,
         table2.some2_id,
         table2.some2_name,
         table2.some3_name,
         table2.some4_email,
         table2.some4_name,
         table2.some4_place1_gift,
         table2.some4_place2_gift,
         table2.some4_place3_gift
         FROM (
             SELECT
                 table1.id AS table1_id,
                 table2.id AS table2_id
             FROM
                 table2
             INNER JOIN
                 table1 ON table2.table1_id = table1.id
             WHERE
                 table2.some1_name = 'Some1_Name_Example'
                 AND table2.some2_name = 'Some2_Name_Example'
                 AND table2.some3_name = 'Some3_Name_Example'
                 AND (
                     table2.some2_id IN (1, 22, 975, 5981, 6127, 10861)
                     OR table2.some1_id IN (2564, 4886, 12514, 12724, 13905, 15491, 16295, 18125, 22162, 20702, 20704, 21503)
                     OR (table2.some2_id = 1277 AND table1.date < '2022-03-27 00:00:00')
                     OR (table2.some2_id = 1678 AND table1.date < '2023-06-16 00:00:00')
                     OR (table2.some2_id = 9519 AND table1.date < '2021-01-05 00:00:00')
                     OR (table2.some1_id = 4648 AND table1.date < '2023-06-16 00:00:00')
                 )
             ORDER BY
                 table1.date DESC,
                 table2.id DESC
             LIMIT 200
          ) ids
     INNER JOIN table1 ON ids.table1_id = table1.id
     INNER JOIN table2 ON ids.table2_id = table2.id AND table2.table1_id = table1.id
     ORDER BY     
        table1.date DESC,
        table2.id DESC