MySQL select where not in another returned data from sql statement

126 Views Asked by At

I have this problem where I want to first select 8 elements from a mysql database ordering by id DESC. Then I want to select another group of results (8 items), this time order by date DESC but the results here I want to ensure that they are not already on the fisrt query the one for ordering by id. The data is in the same table just with different columns like id,name,date,.

So far I have tried writing different queries to get the data but the data contains some similar items of which that is what I don't want. Here are the queries I have written;

this returns 8 items sorted by id DESC

SELECT name FROM person order by id DESC LIMIT 8;

this returns 8 items also but sorted by date DESC

SELECT name FROM person order by date DESC LIMIT 8;

the returned data contain duplicate items!

3

There are 3 best solutions below

0
Aviran Ben David On BEST ANSWER

You could use a nested query, first select the first 8 id's, then select the first 8 records ordered by date, excluding those id's:

SELECT name FROM person 
WHERE id NOT IN
  (SELECT id FROM person order by id DESC LIMIT 8) AS exc
ORDER BY date DESC LIMIT 8
0
Joel Coehoorn On

The first query should return the primary key for the table. If name is the key then so be it, but probably that id field is the better choice.

Then we can write the query like this:

SELECT p.name 
FROM Person p
WHERE NOT EXISTS (
    SELECT 1
    FROM (SELECT id FROM Person ORDER BY id DESC LIMIT 8) p0
    WHERE p0.id = p.id
)
ORDER BY p.date DESC 
LIMIT 8;

We could also use an exclusion join which is usually slower, but in this case reduces one level of nesting so it might do better:

SELECT p.name
FROM Person p
LEFT JOIN (
    SELECT id 
    FROM Person 
    ORDER BY id DESC 
    LIMIT 8
) p0 ON p0.id = p.id
WHERE p0.id is null
ORDER BY p.date DESC
LIMIT 8;

One other thing to keep in mind is MySQL is strict about what kinds of subquery can use the LIMIT keyword. Specifically, you need it to be a derived table. I know the exclusion join option should qualify, but I'm less sure of the NOT EXISTS() option.

0
GMB On

Why not generate both resultsets with a single query? We can combine window functions, order by, and limit to generate a resultset containing the top 8 rows per id and the top 8 rows per date, while avoiding duplicates:

select *
from (
    select p.*,
        row_number() over(order by id desc) rn_id,
        row_number() over(order by date desc) rn_dt
    from person p
) p
order by case when rn_id <= 8 then rn_id else 9 end, rn_dt
limit 16

In the subquery, the window functions enumerate records by descending id and date. The outer query performs a conditional sort that puts the top 8 id first, and orders the rest of the records by descending date. All that is left to do is retain the top 16 results from the query. You don't need to worry about duplicates since the table is scanned only once.

Here is a small test case:

id date
1 2022-11-11
2 2022-11-09
3 2022-11-05
4 2022-11-06
5 2022-11-07
6 2022-11-08
7 2022-11-10

For this sample data, and given a target of 3 + 3 records (instead of 8 + 8 in our code), the query returns:

id date rn_id rn_dt
7 2022-11-10 1 2
6 2022-11-08 2 4
5 2022-11-07 3 5
1 2022-11-11 7 1
2 2022-11-09 6 3
4 2022-11-06 4 6

Typically, id 7, which has both the greatest id the second latest date, shows up in the first part of the resultset (the top 3 rows are sorted by descending id), but is not repeated in the second part.

Demo on DB fiddle