Count the number of elements in sets A&B, A&~B, ~A&B

89 Views Asked by At

Table A is a list of IDs.

ID
---
1
5
9
...

Table B has IDs and another Y/N value.

ID   Value
----------
1      0
2      1
3      0
...

I'd the count of Value for IDs that are:
i) both in tables A & B,
ii) in A but not B,
iii) in B but A.

One way is to use union:

SELECT Value, COUNT(*) AS count FROM tableA INNER JOIN tableB USING (id) GROUP BY Value
UNION 
SELECT Value, COUNT(*) AS count FROM tableA RIGHT JOIN tableB USING (id) WHERE tableA.id IS NULL GROUP BY Value

Is there a better way?

4

There are 4 best solutions below

0
GMB On BEST ANSWER

You want ids from both tables, which reads like a full join (if your database, which you did not tag, does support it). We can then compute the counts with conditional expressions.

select 
    sum(case when t1.id is not null and t2.id is not null then 1 else 0 end) cnt_in_both_tables,
    sum(case when t1.id is not null and t2.id is     null then 1 else 0 end) cnt_in_t1_only,
    sum(case when t1.id is     null and t2.id is not null then 1 else 0 end) cnt_in_t2_only
from mytable1 t1
full join mytable2 t2 on t1.id = t2.id

This assumes that ids are unique in both table, as shown in your sample data - otherwise we would need to deduplicate both datasets first.

In databases that do not support full joins, we would likely emulate it with union and subqueries.

0
June7 On

RDBMS not indicated and I use Access which does not support FULL OUTER JOIN. Consider:

SELECT "InBoth" AS Cat, B.ID, Count(*) AS Cnt FROM A INNER JOIN B ON A.ID = B.ID GROUP BY "Both", B.ID
UNION SELECT "InAnotB", A.ID, Count(*) FROM A LEFT JOIN B ON A.ID = B.ID WHERE (((B.ID) Is Null)) GROUP BY "InAnotB", A.ID
UNION SELECT "InBnotA", B.ID, Count(*) FROM B LEFT JOIN A ON B.ID = A.ID WHERE (((A.ID) Is Null)) GROUP BY "InBnotA", B.ID;
0
9DA On

The different sets (A&B, A&~B, ~A&B) represent different ways you can join your tables.

I am assuming the column ID is the primary key in table A as well as in table B, if not you need to add a DISTINCT inside the count, e.g. COUNT(DISTINCT A.id).

A&B = inner join table A and B on column ID -> returns only rows with IDs that exist in both tables

SELECT COUNT(A.id)
FROM A
INNER JOIN B ON A.id = B.id

A&~B = A left join B on column ID and filter on B.id is null -> keeps all values in table A, joins table B onto table A and keeps only the rows where it could not find a matching value in B

SELECT COUNT(A.id)
FROM A
LEFT JOIN B ON A.id = B.id
WHERE B.id IS NULL

~A&B = A right join B on column ID and filter on A.id is null -> keeps all values in table B, joins table A onto table B and keeps only the rows where it could not find a matching value in A

SELECT COUNT(B.id)
FROM A
RIGHT JOIN B ON A.id = B.id
WHERE A.id IS NULL

Alternatively, you can use a left join for ~A&B, just make sure your tables are on the correct side of the join depending on the intended outcome and that you filter on the correct column.

SELECT COUNT(B.id)
FROM B
LEFT JOIN A ON B.id = A.id
WHERE A.id IS NULL
1
mlopez11 On

Following the approach of the previous answer, but with a more compact solution (assuming that the ID field is mandatory and unique in both tables):

SELECT 
    SUM(CASE WHEN A.ID = B.ID THEN 1 END) AS "A and B",
    SUM(CASE WHEN B.ID is null THEN 1 END) AS "A but not B",
    SUM(CASE WHEN A.ID is null THEN 1 END) AS "B but not A" 
FROM tableA A FULL OUTER JOIN tableB B ON A.id = B.id