Selecting Max record in nested Join more efficiently

131 Views Asked by At

I am trying to figure out the most efficient method of writing the query below. Right now it is using a user table of 3k records, scheduleday of 12k records, and scheduleuser of 300k records.

The method I am using works, but it is not fast. It is plenty fast of 100 and under records, but not how I need it displayed. I know there must be a more efficient way of running this, if i take out the nested select, it runs in .00025 seconds. Add the nested, and we're pushing 9+ seconds.

All I am trying to do is get the most recent date a user was scheduled. The scheduleuser table only tells the scheduleid and dayid. This is then looked up in scheduleday to get the date. I cant use max(scheduleuser.rec) because the order entered may not be in date order.

The result of this query would be: Bob 4/6/2022 Ralph 4/7/2022

Please note this query works perfectly fine, I am looking for ways to make it more efficient. Percona Server Mysql 5.5

SELECT 

(
SELECT MAX(STR_TO_DATE(scheduleday.ddate, '%m/%d/%Y')) FROM scheduleuser su1
LEFT JOIN scheduleday ON scheduleday.scheduleid=su1.scheduleid AND scheduleday.dayid=su1.dayid WHERE su1.idUser=users.idUser
) 
as lastsecheduledate, users.usersName

users

idUser usersName
1 bob
2 ralph

scheduleday

scheduleid dayid ddate
1 1 4/5/2022
1 2 4/6/2022
1 3 4/7/2022

scheduleuser (su1)

rec idUser dayid scheduleid
1 1 2 1
1 2 3 1
1 1 1 1

As requested, full query

SELECT users.iduser, users.adminName, users.firstname, users.lastname, users.lastLogin, users.area, users.type, users.terminationdate, users.termreason, users.cellphone,

(SELECT MAX(STR_TO_DATE(scheduleday.ddate, '%m/%d/%Y')) FROM scheduleuser "
'mySQL=mySQL&" LEFT JOIN scheduleday ON scheduleday.scheduleid=scheduleuser.scheduleid AND scheduleday.dayid=scheduleuser.dayid WHERE scheduleuser.iduser=users.iduser "
'mySQL=mySQL&" ) as lastsecheduledate,

IFNULL(userrating.rating,'0.00') as userrating, IFNULL(location.area,'') as userarea, IFNULL(usertypes.name,'') as usertype, IFNULL(useropen.iduser,0) as useropen 

FROM users 

mySQL=mySQL&" LEFT JOIN userrating ON userrating.iduser=users.iduser "
mySQL=mySQL&" LEFT JOIN location ON location.idarea=users.area "
mySQL=mySQL&" LEFT JOIN usertypes ON usertypes.idtype=users.type "
mySQL=mySQL&" LEFT JOIN useropen ON useropen.iduser=users.iduser "

WHERE

users.type<>0 AND users.active=1 

ORDER BY users.firstName

As requested, create tables



CREATE TABLE `users` (
  `idUser` int(11) NOT NULL,
  `usersName` varchar(255) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `users`
  ADD PRIMARY KEY (`idUser`);
ALTER TABLE `users`
  MODIFY `idUser` int(11) NOT NULL AUTO_INCREMENT;
COMMIT;

CREATE TABLE `scheduleday` (
  `rec` int(11) NOT NULL,
  `scheduleid` int(11) NOT NULL,
  `dayid` int(11) NOT NULL,
  `ddate` varchar(255) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `scheduleday`
  ADD PRIMARY KEY (`rec`),
  ADD KEY `dayid` (`dayid`),
  ADD KEY `scheduleid` (`scheduleid`);
ALTER TABLE `scheduleday`
  MODIFY `rec` int(11) NOT NULL AUTO_INCREMENT;
COMMIT;


CREATE TABLE `scheduleuser` (
  `rec` int(11) NOT NULL,
  `idUser` int(11) NOT NULL,
  `dayid` int(11) NOT NULL,
  `scheduleid` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `scheduleuser`
  ADD PRIMARY KEY (`rec`),
  ADD KEY `idUser` (`idUser`),
  ADD KEY `dayid` (`dayid`),
  ADD KEY `scheduleid` (`scheduleid`);
ALTER TABLE `scheduleuser`
  MODIFY `rec` int(11) NOT NULL AUTO_INCREMENT;
COMMIT;

3

There are 3 best solutions below

0
On

When you nest a query in the SELECT as you're doing, that query will get evaluated for each record in the result set because its WHERE clause is utilizing a column from outside the query. You really just want to calculate a result set of max dates only once and join your users on after it is done:

select usersName, last_scheduled
from users 
     left join (select su.iduser, max(sd.ddate) as last_scheduled
                from scheduleuser as su left join scheduleday as sd on su.dayid = sd.dayid
                                                                       and su.scheduleid = sd.scheduleid
                group by su.iduser) recents on users.iduser = recents.iduser

I've obviously left your other columns off and just given you the name and date, but this is the general principle.

0
On

I think my recommendation would be to do that subquery once with a GROUP BY and join it. Something like

SELECT users.iduser, users.adminName, users.firstname, users.lastname, users.lastLogin, users.area, users.type, users.terminationdate, users.termreason, users.cellphone,
lsd.lastsecheduledate,
IFNULL(userrating.rating,'0.00') as userrating, IFNULL(location.area,'') as userarea, IFNULL(usertypes.name,'') as usertype, IFNULL(useropen.iduser,0) as useropen 
FROM users
LEFT JOIN (SELECT iduser, MAX(STR_TO_DATE(scheduleday.ddate, '%m/%d/%Y')) lastscheduledate FROM scheduleuser LEFT JOIN scheduleday ON scheduleday.scheduleid=scheduleuser.scheduleid AND scheduleday.dayid=scheduleuser.dayid
GROUP BY iduser
) lsd
ON lsd.iduser=users.iduser
LEFT JOIN userrating ON userrating.iduser=users.iduser 
LEFT JOIN location ON location.idarea=users.area 
LEFT JOIN usertypes ON usertypes.idtype=users.type 
LEFT JOIN useropen ON useropen.iduser=users.iduser 
WHERE
users.type<>0 AND users.active=1 
ORDER BY users.firstName

This will likely be more efficient since the DB can do the query once for all users, likely using your scheduleuser.iduser index.

If you are using something like above and it's still not performant, I might suggest experimenting with:

ALTER TABLE scheduleuser ADD INDEX (scheduleid, dayid)
ALTER TABLE scheduleday ADD INDEX (scheduleid, dayid)

This would ensure it can do the entire join in the subquery with the indexes. Of course, there are tradeoffs to adding more indexes, so depending on your data profile it might not be worth it (and it might not actually improve anything).

If you are using your original query, I might suggest experimenting with:

ALTER TABLE scheduleuser ADD INDEX (iduser,scheduleid, dayid)
ALTER TABLE scheduleday ADD INDEX (scheduleid, dayid)

This would allow it to do the subquery (both the JOIN and the WHERE) without touching the actual scheduleuser table at all. Again, I say "experiment" since there are tradeoffs and this might not actually improve things much.

0
On

Bug:

MAX(STR_TO_DATE(scheduleday.ddate, '%m/%d/%Y'))

Change to

STR_TO_DATE(MAX(scheduleday.ddate), '%m/%d/%Y')

Else you will be in for a rude surprise next January.

Possible better indexes. Switch from MyISAM to InnoDB. The following indexes assume InnoDB; they may not work as well in MyISAM.

users:  INDEX(active, type)
userrating:  INDEX(iduser,  rating)
location:  INDEX(idarea,  area)
usertypes:  INDEX(idtype,  name)
useropen:  INDEX(iduser)
scheduleday:  INDEX(scheduleid, dayid, ddate)
scheduleuser:  INDEX(iduser, scheduleid,  dayid)
users:  INDEX(iduser)

When adding a composite index, DROP index(es) with the same leading columns. That is, when you have both INDEX(a) and INDEX(a,b), toss the former.