Optimized way to get top n records of each group

310 Views Asked by At

I need top 6 records of each subcategory from a mysql table having almost 100k records. I tried the following mysql query, but I'm concerned about its performance in my table having large number of records.

SELECT 
    *
FROM
    (SELECT 
        sub_cat_id,
        title,      
        @rn:=IF(@prev = sub_cat_id, @rn + 1, 1) AS rn,
        @prev:=sub_cat_id AS previd,
        created_date
    FROM
        blog
    WHERE
        type = 'BLOG'
            AND FIND_IN_SET(sub_cat_id, '1,2,8')
            AND created_date <= NOW()
    ORDER BY sub_cat_id DESC , created_date DESC) AS records
WHERE
    rn <= 6

In the above query, MySQL will number all the records having sub_cat_id 1,2 and 8. In the outer select query, the records will cut down to 6 records for each subcategory.

I'm having the following concerns

  • Is this the better and fastest way to get 6 records for each subcategory
  • Is this query generates desired result.

Here is my blog table

+-------- +------------+-------+--------------+
| blog_id | sub_cat_id | title | created_date |
+-------- +------------+-------+--------------+
| 1       | 1          | ABC   | 2018-05-25   |
| 2       | 1          | ABC   | 2018-05-22   |
| 3       | 2          | ABC   | 2018-05-23   |
| 4       | 2          | ABC   | 2018-05-21   |
| 5       | 2          | ABC   | 2018-05-20   |
| 6       | 8          | ABC   | 2018-05-15   |
+-------- +------------+-------+--------------+
1

There are 1 best solutions below

4
On BEST ANSWER

Your approach is fine, but your query is not. In particular, MySQL does not guarantee the order of evaluation of expressions in a SELECT, so you should not assign a variable in one expression and use it in another.

Fortunately, you can combine the assignments into a single expression:

SELECT b.*
FROM (SELECT b.sub_cat_id, b.title,  created_date     
             (@rn := IF(@sc = b.sub_cat_id, @rn + 1,
                        if(@sc := b.sub_cat_id, 1, 1)
                       )
             ) as rn
      FROM blog b CROSS JOIN
           (SELECT @sc := -1, @rn := 0) params
      WHERE b.type = 'BLOG' AND
            b.sub_cat_id IN (1, 2, 8) AND
            b.created_date <= NOW()  -- is this really needed?
      ORDER BY b.sub_cat_id DESC, b.created_date DESC) AS records
     ) b
WHERE rn <= 6;

For this query, you want indexes. I think this will work: type, sub_cat_id, created_date). Unfortunately, the group by will still require sorting the data. In more recent versions of MySQL, I think you need to do the sorting in a subquery and then the rn assignment afterwards.

I do wonder if this formulation could be made to be more effective:

select b.*
from blogs b
where b.type = 'BLOG' and
      b.sub_cat_id in (1, 2, 8) and
      b.created_at >= (select b2.created_at
                       from blogs b2
                       where b2.type = b.type and
                             b2.sub_cat_id = b.sub_cat_id
                       order by b2.created_at desc
                       limit 1 offset 5
                      );

For this, you want an index on blog(type, sub_cat_id, created_at).