Optimizing MySql query (IN) to avoid using “Using filesort”

1.5k Views Asked by At

I need your help to optimize the query to avoid using "Using filesort". The query is:

SELECT name
FROM actor
WHERE actor_id IN (3333,1452,2587,3003,3044,3524,3700,7087,7088)
ORDER BY name ASC

The Explain results:

1   SIMPLE  actor   range   PRIMARY PRIMARY 2       9   Using where; Using filesort

============================================================

SQL Fiddle http://sqlfiddle.com/#!2/50c4d/1/0

Table:

CREATE TABLE `actor` (
`actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(45) NOT NULL,  
PRIMARY KEY (`actor_id`),
UNIQUE KEY `name_UNIQUE` (`name`)
) ENGINE=InnoDB;

Sample data:

INSERT INTO `actor` VALUES (7087, 'Brill');
INSERT INTO `actor` VALUES (3333, 'Rey');
INSERT INTO `actor` VALUES (7088, 'Graves');
INSERT INTO `actor` VALUES (1452, 'Phoenix');
INSERT INTO `actor` VALUES (2587, 'Segal');
INSERT INTO `actor` VALUES (3003, 'Taylor');
INSERT INTO `actor` VALUES (3044, 'Daniels');
INSERT INTO `actor` VALUES (3524, 'Michaels');
INSERT INTO `actor` VALUES (3700, 'Tryme');

Index:

ADD INDEX idx_test (actor_id, name) -> EXTRA: Using where; Using index; Using filesort
2

There are 2 best solutions below

3
On

I'm always confused why people are so hell-bent on avoiding FILESORT !?!

You are asking for a sub-set of a table based on the actor_id. The server sees there is an index (PK or idx_test) on the actor_id field and will zip through the index to find the relevant records and return them. Now, additionally, you also want the output to be in a given order. Had the order been ORDER BY actor_id then it would have been possible to make use of the fact that the fetched records were already pre-sorted on this field in the index (or PK). Thus, no re-sorting would be required and the output could be returned 'as-is' (**).

BUT, you don't want them in order of actor_id, you want them in order of name. So, the machine does what you ask and sorts the records by name before returning them to you. I doubt sorting such a small list is going to take up a lot of resources or time.

PS: I don't think the index helps you much here, in fact it (badly!) duplicates the (clustered) PK. The only (potential) benefit i can see for such an index would be that IF the actual table is much wider THEN it would work as a covering index for this query, reducing the I/O (++). Mind you, this also means you can't ask for any of the other fields when querying.

(**: I'm sure it's all a bit more complex internally)

(++: Less I/O in case of SELECT, IUD will require more I/O)

0
On

You can use an index for the range predicate IN(...). Or you can use an index to eliminate the filesort.

You can't get both operations to use an index, at least not when the column in the predicate is different from the column in the sort order.

You created this index:

ADD INDEX idx_test (actor_id, name) -> EXTRA: Using where; Using index; Using filesort

This helped to find the matching actor_id values. And the composite index included the name column you wanted. But then you want it sorted by name. The index is not sorted by name, it's sorted by actor_id then by name.

Here's an Analogy to Explain Why This Doesn't Work:

Suppose I ask you to look in the telephone book and find people whose last names are Franklin, Hamilton, Jefferson, Washington. Then sort them by first name. The phone book is ordered by last name, then by first name. So you can find the last names quickly, but the first names are returned Benjamin, Alexander, Thomas, George -- not in any sensible order.

To sort them by first name, you'd have to collect the results and then sort them manually. The fact that they were sorted in the telephone book doesn't help.