How do you code an SQL query that determines if table A is a proper subset of table B?

401 Views Asked by At

If A is a proper subset of B or vice versa, then the query should return true. if not, then false:

If A or B is a proper set of each other, then true. if A or B is a improper set of each other, then false.

This is the code to create table A and table B:

Create table A (i int);
Create table B (i int);

Insert into A values (1);
Insert into A values (2);
Insert into A values (3);

Insert into B values (2);
Insert into B values (3);
Insert into B values (6);

This is the code that i wrote:

SELECT Count(*) >= 1 
FROM A 
RIGHT JOIN B ON A=B 
WHERE A IS NULL;

I get a true, but it should be a false.

'True' output

Does anybody know what is wrong with my code?

3

There are 3 best solutions below

2
On

I understand you want the query to check if either A is proper subset of A, or if B is a proper subset of A (both cannot be true at the same time).

A ⊂ B reads as: every element of A can be found in B, and not all elements of B exist in A.

In SQL this would be a full join and conditional logic:

select 
       (bool_and(b.i is not null) and bool_or(a.i is null))    -- A ⊂ B
    or (bool_and(a.i is not null) and bool_or(b.i is null))    -- B ⊂ A
from a
full join b using (i)
1
On

This answer is not Postgres specific.

select 1
from A full outer join B on B.i = A.i
having count(case when A.i is null then 1 end) = 0 or
       count(case when B.i is null then 1 end) = 0

You said "of each other" so I allowed for both options. Of course it you really mean to ask whether A is a proper subset of B then you really just want:

count(case when A.i is null then 1 end) = 0 and
count(case when B.i is null then 1 end) > 0

Counting nulls (on the A side) can also be expressed as:

count(*) - count(A.i)
0
On

Using the utility functions by intarray, this problem becomes much easier:

WITH A(i) AS (
    SELECT *
    FROM (VALUES (1),
                 (2),
                 (3)) as a
), B(i) AS (
    SELECT *
    FROM (VALUES (2),
                 (3),
                 (6)) as b
)
-- The above is just fake data for A and B
SELECT (SELECT ARRAY_AGG(i) FROM A) <@ (SELECT ARRAY_AGG(i) FROM B)

where <@ is the CONTAINED operator. It returns whether the left array is contained by the right array. If you want to account for either option, you could just chain the cases using OR:

WITH A(i) AS (
    SELECT *
    FROM (VALUES (1),
                 (2),
                 (3)) as a
), B(i) AS (
    SELECT *
    FROM (VALUES (2),
                 (3),
                 (6)) as b
)
-- The above is just fake data for A and B
SELECT (SELECT ARRAY_AGG(i) FROM A) <@ (SELECT ARRAY_AGG(i) FROM B)
    OR (SELECT ARRAY_AGG(i) FROM A) @> (SELECT ARRAY_AGG(i) FROM B)