SQL that check for Duplicates with NULL doesn't not return

90 Views Asked by At

I am using 2 separate query to check for duplicates and both query were supposed to give the same answer when executed, however it did not. Below is an example of my database table data:

id   Name   Age  Group SeatNo
------------------------------------------
1    Alpha  11    A    NULL
2    Bravo  12    A    1
3    Alpha  11    B    NULL

This is my first query where it shows there are duplicates

SELECT count(*)
FROM Test AS ta
JOIN Test AS tb ON ta.name=tb.name AND ta.age=tb.age
WHERE ta.GroupName='A'
AND tb.GroupName='B'

This is my second Query that shows zero duplicates

SELECT count(*)
FROM Test AS ta
JOIN Test AS tb ON ta.name=tb.name AND ta.age=tb.age AND ta.SeatNo=tb.SeatNo
WHERE ta.GroupName='A'
AND tb.GroupName='B

After looking through both query and the data in the table, it seems that NULL in the seatNo affected the second query and cause it to return 0 duplicates. Is there any solution to search for duplicates even though it is NULL?

3

There are 3 best solutions below

0
On BEST ANSWER

You have learned that NULL <> NULL. What can you do about it?

Well, ANSI SQL has a comparator for this purpose IS DISTINCT FROM. So, you could write:

SELECT count(*)
FROM Test AS ta JOIN
     Test AS tb
     ON ta.name = tb.name AND ta.age = tb.age AND
        NOT ta.SeatNo IS DISTINCT FROM tb.SeatNo
WHERE ta.GroupName = 'A' AND
      tb.GroupName = 'B';

However, most databases do not support that. A more general form is:

SELECT count(*)
FROM Test AS ta JOIN
     Test AS tb
     ON ta.name = tb.name AND ta.age = tb.age AND
        (ta.SeatNo = tb.SeatNo OR (ta.SeatNo IS NULL AND tb.SeatNo IS NULL) )
WHERE ta.GroupName = 'A' AND
      tb.GroupName = 'B';

Another method uses UNION ALL and aggregation. The following gets the duplicates:

select name, age, SeatNo
from ((select name, age, SeatNo, 'a' as which
       from test
       where GroupName = 'A'
      ) union all
      (select name, age, SeatNo, 'B' as which
       from test
       where GroupName = 'B'
      )
     ) ab
group by name, age, seatno
having count(distinct which) = 2;

You can then use this as a subquery to get the count:

select count(*)
from (select name, age, SeatNo
      from ((select name, age, SeatNo, 'a' as which
             from test
             where GroupName = 'A'
            ) union all
            (select name, age, SeatNo, 'B' as which
             from test
             where GroupName = 'B'
            )
           ) ab
      group by name, age, seatno
      having count(distinct which) = 2  
     ) ab;
0
On

For Second Query you can add one more filter condition in JOIN

SELECT count(*)
FROM Test AS ta
JOIN Test AS tb ON ta.name=tb.name AND 
                   ta.age=tb.age AND 
                   ( ta.SeatNo=tb.SeatNo OR (ta.SeatNo IS NULL AND tb.SeatNo IS NULL)
WHERE ta.GroupName='A'
AND tb.GroupName='B
0
On

Your Second Query should check for NULL. Try,

SELECT count(*)
FROM Test AS ta
JOIN Test AS tb ON ta.name=tb.name AND ta.age=tb.age AND ISNULL(ta.SeatNo,'')=ISNULL(tb.SeatNo,'')
WHERE ta.GroupName='A'
AND tb.GroupName='B