mysql row_number() performance

269 Views Asked by At

Sorry. This is a rewritten question because there was no answer.

I changed it from rank() to row_number() because the same rank is not needed.


I'm currently working on a query, but I'm not getting the speed so I'm asking a question.

What I want to find is 'count the most names in each category by DataID'.

Data_Category { id : INT PRIMARY KEY AUTO_INCREMENT, DataId: INT, name: varchar2(200), category: varchar2(200) }

and it is possible to input in duplicate.

select dataId, name, category, count(*) as cnt
, row_number() over (partition by dataId, category order by count(*) desc) as "ranking"
from data_category
group by dataId, name, category

In the above query, there is a difference in speed with and without the rank part.

If there is no row_number, it takes 0.0053, and if there is, it takes 0.5 seconds. The result is about 260,000 rows.

select *, row_number() over (partition by A.dataId, A.category order by cnt desc) as "ranking"
from(
  select dataId, name, category, count(*) as cnt
  from data_category
  group by dataId, name, category
) A

The above query also yields almost the same speed. The moment it is wrapped in a select statement, it is slowing down due to the temporary table creation.

The data is about 400,000 cases.

The index is (dataId, category, name) .

I want to extract 5 names with the highest number of names in each category by data ID, but I can't figure out the direction of the query.

And I will create 'view' using this query. So there are no additional where clause. After that, I want to use it by joining it with another table.

Even if you don't write a query, I sincerely ask for a simple hint..!

Output:

dataId category name rank
1 cate1 name1_1 1
1 cate1 name1_2 2
1 cate1 name1_5 3
1 cate1 name1_3 4
1 cate2 name2_1 1
1 cate2 name2_5 2
1 cate2 name2_3 3
2 cate1 name3_1 1
2 cate3 name3_9 1

Thank you.


Explain

#Query 1-2 : Without rank_row

id select_type table type key rows filtered extra
1 SIMPLE data_cateogory index IDX_39ea6497ea9ca40d9e783e 432344 100.00 Using index;

#Query 1-1 : With rank_row

id select_type table type key rows filtered extra
1 SIMPLE data_cateogory index IDX_39ea6497ea9ca40d9e783e 432344 100.00 Using index; Using temporary; Using filesort
1

There are 1 best solutions below

4
Rick James On

Does this give you what you need?

SELECT DataID, category, COUNT(*)
    FROM t
    GROUP BY DataID, category;

Or, if name has duplicates, replace COUNT(*) by COUNT(DISTINCT name)