How to optimize join which causes very slow performace

676 Views Asked by At

This query runs more than 12 seconds, even though all tables are relatively small - about 2 thousands rows.

SELECT attr_73206_ AS attr_73270_ 
 FROM object_73130_ f1 
 LEFT OUTER JOIN (
   SELECT id_field, attr_73206_ FROM (
     SELECT m.id_field, t0.attr_73102_ AS attr_73206_ FROM object_73200_ o
     INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = o.id 
     INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_ 
     ORDER BY o.id_order 
   ) AS o GROUP BY o.id_field 
 ) AS o ON f1.id = o.id_field

Both tables have fields id as primary keys. Besides, id_field, id_order,attr_73206_ and all fields in master_slave are indexed. As for the logic of this query, on the whole it's of master-detail kind. Table object_73130_ is a master-table, table object_73200_ is a detail-table. They are linked by a master_slave table. object_73101_ is an ad-hoc table used to get a real value for the field attr_73206_ by its id. For each row in the master table the query returns a field from the very first row of its detail table. Firstly, the query had another look, but here at stackoverflow I was advised to use this more optimized structure (instead of a subquery which was used previously, and, by the way, the query started to run much faster). I observe that the subquery in the first JOIN block runs very fast but returns a number of rows comparable to the number of rows in the main master-table. In any way, I do not know how to optimize it. I just wonder why a simple fast-running join causes so much trouble. Oh, the main observation is that if I remove an ad-hoc object_73101_ from the query to return just an id, but not a real value, then the query runs as quick as a flash. So, all attention should be focused on this part of the query

INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_

Why does it slow down the whole query so terribly?

EDIT

In this way it runs super-fast

SELECT t0.attr_73102_ AS attr_73270_ 
FROM object_73130_ f1 
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
    SELECT m.id_field, attr_73206_ FROM object_73200_ o
    INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290)  AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = o.id 
    ORDER BY o.id_order 
) AS o GROUP BY o.id_field 
) AS o ON f1.id = o.id_field
LEFT JOIN object_73101_ t0 ON t0.id = o.attr_73206_ 

So, you can see, that I just put the add-hoc join outside of the subquery. But, the problem is, that subquery is automatically created and I have an access to that part of algo which creates it and I can modify this algo, and I do not have access to the part of algo which builds the whole query, so the only thing I can do is just to fix the subquery somehow. Anyway, I still can't understand why INNER JOIN inside a subquery can slow down the whole query hundreds of times.

EDIT

A new version of query with different aliases for each table. This has no effect on the performance:

SELECT attr_73206_ AS attr_73270_ 
FROM object_73130_ f1 
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
    SELECT m.id_field, t0.attr_73102_ AS attr_73206_ FROM object_73200_ a
    INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290)  AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = a.id 
    INNER JOIN object_73101_ t0 ON t0.id = a.attr_73206_ 
    ORDER BY a.id_order 
) AS b GROUP BY b.id_field 
) AS c ON f1.id = c.id_field

EDIT

This is the result of EXPLAIN command:

| id | select_type  |  TABLE  |  TYPE  | possible_keys         |     KEY     | key_len  |   ROWS  |  Extra                          |
| 1  |  PRIMARY     |   f1    |  INDEX |  NULL                 |    PRIMARY  |   4      |  1570   | USING INDEX
| 1  |  PRIMARY     | derived2|  ALL   |  NULL                 |    NULL     |  NULL    |  1564   |
| 2  |  DERIVED     | derived3|  ALL   |  NULL                 |    NULL     |  NULL    |  1575   | USING TEMPORARY; USING filesort
| 3  |  DERIVED     | m       | RANGE  | id_object,id_master,..|   id_object |   4      |  1356   | USING WHERE; USING TEMPORARY; USING filesort 
| 3  |  DERIVED     | a       | eq_ref | PRIMARY,attr_73206_   |   PRIMARY   |   4      |    1    |
| 3  |  DERIVED     | t0      | eq_ref | PRIMARY               |   PRIMARY   |   4      |    1    |

What is wrong with that?

EDIT

Here is the result of EXPLAIN command for the "super-fast" query

| id | select_type  |  TABLE  | TYPE  | possible_keys        |     KEY     | key_len  |   ROWS  |  Extra                          
| 1  |  PRIMARY     |   f1    | INDEX | NULL                 |    PRIMARY  |   4      |  1570   | USING INDEX
| 1  |  PRIMARY     | derived2| ALL   | NULL                 |    NULL     |  NULL    |  1570   |
| 1  |  PRIMARY     | t0      | eq_ref| PRIMARY              |    PRIMARY  |   4      |  1      | 
| 2  |  DERIVED     | derived3| ALL    | NULL                |   NULL      |   NULL   |  1581   | USING TEMPORARY; USING filesort 
| 3  |  DERIVED     | m       | RANGE  | id_object,id_master,|   id_bject  |   4      |  1356   | USING WHERE; USING TEMPORARY; USING filesort
| 3  |  DERIVED     | a       | eq_ref | PRIMARY             |   PRIMARY   |   4      |    1    |

CLOSED

I will use my own "super-fast" query, which I presented above. I think it is impossible to optimize it anymore.

1

There are 1 best solutions below

5
On

Without knowing the exact nature of the data/query, there are a couple things that I'm seeing:

  1. MySQL is notoriously bad at handling sub-selects, as it requires the creation of derived tables. In fact, some versions of MySQL also ignore indexes when using sub-selects. Typically, it's better to use JOINs instead of sub-selects, but if you need to use sub-selects, it's best to make that sub-select as lean as possible.

  2. Unless you have a very specific reason for putting the ORDER BY in the sub-select, it may be a good idea to move it to the "main" query portion because the result set may be smaller (allowing for quicker sorting).

So all that being said, I tried to re-write your query using JOIN logic, but I was wondering What table the final value (attr_73102_) is coming from? Is it the result of the sub-select, or is it coming from table object_73130_? If it's coming from the sub-select, then I don't see why you're bothering with the original LEFT JOIN, as you will only be returning the list of values from the sub-select, and NULL for any non-matching rows from object_73130_.

Regardless, not knowing this answer, I think the query below MAY be syntactically equivalent:

SELECT t0.attr_73102_ AS attr_73270_ 
FROM object_73130_ f1 
LEFT JOIN (object_73200_ o
  INNER JOIN master_slave m ON m.id_slave_field = o.id
  INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_)
ON f1.id = o.id_field
WHERE m.id_object IN (73130,73290) 
AND m.id_master IN (73200,73354) 
GROUP BY o.id_field
ORDER BY o.id_order;