Alternative to COUNT(DISTINCT column_name) for retrieving millions of data

2.6k Views Asked by At

I have a table that store more than 10 million of data and when I use simple query such as

SELECT TOP 10 COALESCE(a.Name, 'NA') AS Name, 
              COUNT(DISTINCT a.Id) AS Result1, 
              COUNT(b.Id) AS Result2 
FROM Table1 a INNER JOIN Table2 b ON a.Id = b.Id 
GROUP BY a.Name

it takes approximately 20 mins to return the result.

Is there any way to count the distinct column faster than using COUNT(DISTINCT)?

*I also tried to use GROUP BY but its just the same.

4

There are 4 best solutions below

0
On BEST ANSWER

if the column is a foreign key column you might can try to select the count from the foreign key table where the ID exists in the larger table.

SELECT COUNT(Id) 
FROM   SmallTable
WHERE  Id IN (Select foreignKeyID from LargeTable)
0
On

Without knowing your RDBMS it's impossible to say for sure, but most likely no, there is not a better way. I would guess that you need an index on that column, and that adding one would reduce the query time down to seconds.

5
On

You can try something like this:

select count(column_name) as n
from (select distinct column_name from yourTable) as a

This way, the heavy-weight task of deduplication is done by a simple SELECT.

I'm of course assuming that your column is indexed. If it's not, I strongly suggest you add an index to it.

0
On

A reasonable database should take advantage an index on table(column_name) for your query.