Looking at the various answers for ORDER BY with CASE like this one, I see that what I am forced to do in this legacy application is likely an expert method; however, it is too slow when the rows are less than trivial (rows of 100,000 or more cause page loads of 10 seconds).
Please note that the original query seeks to solve an apparently common problem where the query analyst needs dates that are empty sorted counter to how they would normally be sorted. In this case, datefirstprinted is to be descending, but all records that are not printed should be populated to the top of the list.
The Original Query solves this, but the point of the question is to avoid the filesort performance hit that comes with the derived column notprintedyet.
Original Query
SELECT SQL_NO_CACHE
id, daterun, datefirstprinted,
case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end as notprintedyet
FROM
patientrecords
WHERE
dateuploaded <> '0000-00-00 00:00:00'
ORDER BY
notprintedyet desc, /* ordered via alias */
datefirstprinted desc
LIMIT 10;
time 1.52s
I found that not sorting on the alias notprintedyet saves a bit:
Slightly Faster Query
SELECT SQL_NO_CACHE
id, daterun, datefirstprinted,
case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end as notprintedyet
FROM
patientrecords
WHERE
dateuploaded <> '0000-00-00 00:00:00'
ORDER BY
datefirstprinted = "0000-00-00 00:00:00" desc, /* directly ordered */
datefirstprinted
LIMIT 10;
time 1.37s
Optimal Speed, but missing required sorting of empty dates first
SELECT SQL_NO_CACHE
id, daterun, datefirstprinted,
case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end as notprintedyet
FROM
patientrecords
WHERE
dateuploaded <> '0000-00-00 00:00:00'
ORDER BY
datefirstprinted /* not ordered properly */
LIMIT 10;
time 0.48s
I tried using a view
create view notprinted_patientrecords as (
SELECT id, daterun, datefirstprinted, case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end notprintedyet
FROM patientrecords
WHERE dateuploaded <> '0000-00-00 00:00:00'
);
unfortunately when i run explain
explain select * from notprinted_patientrecords order by notprintedyet desc limit 10;
it shows that i am still using filesort and takes 1.51s aka no savings at all
Would it be faster if datefirstprinted default is NULL?
maybe, but in this legacy app that could do more harm than the 5 seconds extra in page load time
What else might we try? Stored procedures? Functions?
UPDATES
As suggested @strawberry - ORDER BY CASE
...
ORDER BY
case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end, datefirstprinted
LIMIT 10;
time 1.52s
as requested by @e4c5, the explain output:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: patientrecords
type: range
possible_keys: dateuploaded,uploads_report
key: dateuploaded
key_len: 5
ref: NULL
rows: 299095
Extra: Using index condition; Using filesort
except for not ordered properly which has the following variance
rows: 10
Extra: Using where
create table statement
*************************** 1. row ***************************
Table: patientrecords
Create Table: CREATE TABLE `patientrecords` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`datecreated` datetime NOT NULL,
`dateuploaded` datetime NOT NULL,
`daterun` datetime NOT NULL,
`datebilled` datetime NOT NULL,
`datefirstprinted` datetime NOT NULL,
`datelastprinted` datetime NOT NULL,
`client` varchar(5) NOT NULL,
PRIMARY KEY (`id`),
KEY `dateuploaded` (`dateuploaded`),
KEY `daterun` (`daterun`),
KEY `uploads_report` (`dateuploaded`,`client`),
KEY `datefirstprinted` (`datefirstprinted`),
KEY `datelastprinted` (`datelastprinted`)
)
Following ideas learned on concatenated indexes thanks to @e4c5, I tried adding a key on the two columns (column used in
whereand column used incasebasedorderclause):This initially had no effect since mysql continued to use the index
dateuploaded.However adding
force indexreduces the query time:time 0.64 seconds
it is worth noting that i agree with @e4c5 that the extra index will eventually cause writes to have a performance hit; i'm counting on other roadmap development to help with the reduction of the index count. for now, implementing this will reduce the 10 second page loads of the larger result sets to the manageable 3 second range and is then the solution that will be implemented.