(Simplified example). I have this table:
USER ID | code | sales
1 x 100
1 y 200
1 z 150
2 x 300
2 z 17
2 y 100
For each user, I need to show just the record with the max sales.
USER ID | code | sales
1 y 200
2 x 300
In the real table I have, It is millions of records. What would be the most efficient way of doing this?
Right now I find for each user the max sales record and join it unto itself to find the full record.
SELECT * from T
WHERE sales = (SELECT MAX(sales) FROM T WHERE user_id=outr.user_id)
Let's assume that sales number does not repeat itself for a specific user.
Refactor your dependent subquery into an independent subquery and then JOIN it.
The subquery gets the largest sale for each user it. The JOIN operation's ON clause retrieves the detail record for each user's largest sale.
A multicolumn index on
(user_id, sales)will make this query work efficiently when running on a large table.This works on both mysql and postgresql.