Avoid "filesort" on UNION RESULT

991 Views Asked by At

Sub-query 1:

SELECT * from big_table
where category = 'fruits' and name = 'apple'
order by yyyymmdd desc

Explain:

table       |   key           |   extra
big_table   |   name_yyyymmdd |   using where

Looks great!

Sub-query 2:

SELECT * from big_table
where category = 'fruits' and (taste = 'sweet' or wildcard = '*')
order by yyyymmdd desc

Explain:

table       |   key               |   extra
big_table   |   category_yyyymmdd |   using where

Looks great!

Now if I combine those with UNION:

SELECT * from big_table
where category = 'fruits' and name = 'apple'

UNION

SELECT * from big_table
where category = 'fruits' and (taste = 'sweet' or wildcard = '*')

Order by yyyymmdd desc

Explain:

table       |   key      |   extra
big_table   |   name     |   using index condition, using where
big_table   |   category |   using index condition
UNION RESULT|   NULL     |   using temporary; using filesort

Not so good, it uses filesort.

This is a trimmed down version of a more complexed query, here are some facts about the big_table:

  • big_table has 10M + rows
  • There are 5 unique "category"s
  • There are 5 unique "taste"s
  • There are about 10,000 unique "name"s
  • There are about 10,000 unique "yyyymmdd"s
  • I have created single index on each of those fields, plus composite idx such as yyyymmdd_category_taste_name but Mysql is not using it.
2

There are 2 best solutions below

1
Rick James On BEST ANSWER
SELECT * FROM big_table
    WHERE category = 'fruits'
      AND (  name = 'apple'
          OR taste = 'sweet'
          OR wildcard = '*' )
    ORDER BY yyyymmdd DESC

And have INDEX(catgory) or some index starting with category. However, if more than about 20% of the table is category = 'fruits' will probably decide to ignore the index and simply do a table scan. (Since you say there are only 5 categories, I suspect the optimizer will rightly eschew the index.)

Or this might be beneficial: INDEX(category, yyyymmdd), in this order.

The UNION had to do a sort (either in memory on disk, it is not clear) because it was unable to fetch the rows in the desired order.

A composite index INDEX(yyyymmdd, ...) might be used to avoid the 'filesort', but it won't use any columns after yyyymmdd.

When constructing a composite index, start with any WHERE columns compared '='. After that you can add one range or group by or order by. More details.

UNION is often a good choice for avoiding a slow OR, but in this case it would need three indexes

INDEX(category, name)
INDEX(category, taste)
INDEX(category, wildcard)

and adding yyyymmdd would not help unless you add a LIMIT.

And the query would be:

( SELECT * FROM big_table WHERE category = 'fruits' AND name = 'apple' )
UNION DISTINCT
( SELECT * FROM big_table WHERE category = 'fruits' AND taste = 'sweet' )
UNION DISTINCT
( SELECT * FROM big_table WHERE category = 'fruits' AND wildcard = '*' )
ORDER BY yyyymmdd DESC

Adding a limit would be even messier. First tack yyyymmdd on the end of each of the three composite indexes, then

( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
UNION DISTINCT
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
UNION DISTINCT
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
ORDER BY yyyymmdd DESC  LIMIT 10

Adding an OFFSET would be even worse.

Two other techniques -- "covering" index and "lazy lookup" might help, but I doubt it.

Yet another technique is to put all the words in the same column and use a FULLTEXT index. But this may be problematical for several reasons.

2
Bernd Buffen On

This must be also work without UNION

SELECT * from big_table
where 
    ( category = 'fruits' and name = 'apple' )
    OR
    ( category = 'fruits' and (taste = 'sweet' or wildcard = '*')
ORDER BY yyyymmdd desc;