Combine two queries to count distinct strings with different filters

322 Views Asked by At

I am trying to write a single Postgres query that will result in a table which looks like this:

source |yes_gap|no_gap|
-------|-------|------|
allivet|     29|    25|
amazon |    692|   255|

I've been able to write two separate queries, but haven't been able to figure out how to combine them into one.

Here's my query for product_gap='yes':

select 
source,
count(distinct(sku)) as yes_gap
from product_gaps where 
product_gap='yes' and
ingestion_date <= '2021-05-25' 
/* aggregate by source */
group by source

Result:

source |yes_gap|
-------|-------|
allivet|     29|
amazon |    692|

And here's my query for product_gap='no':

select 
source,
count(distinct(sku)) as no_gap
from product_gaps where 
product_gap='no' and
ingestion_date <= '2021-05-25' 
/* aggregate by source */
group by source

Result:

source |no_gap|
-------|------|
allivet|    25|
amazon |   255|

Can I get both counts in a single query?

2

There are 2 best solutions below

1
On BEST ANSWER

Much faster and simpler with conditional aggregates using the aggregate FILTER clause:

SELECT source
     , count(DISTINCT sku) FILTER (WHERE product_gap = 'yes') AS yes_gap
     , count(DISTINCT sku) FILTER (WHERE product_gap = 'no')  AS no_gap
FROM   product_gaps
WHERE  ingestion_date <= '2021-05-25'
GROUP  BY source;

See:

Aside 1: DISTINCT is a key word, not a function. Don't add parentheses for the single column. distinct(sku) is short notation for DISTINCT ROW(sku). It happens to work because Postgres strips the ROW wrapper for a single column, but it's just noise.

Aside 2: product_gap should probably be boolean.

0
On

You did 95% of the work , it just remains to join the 2 sources

 SELECT source, yes_gap,no_gap FROM 
       ( select 
    source,
    count(distinct(sku)) as yes_gap
    from product_gaps where 
    product_gap='yes' and
    ingestion_date <= '2021-05-25' 
    /* aggregate by source */
    group by source ) r1
       FULL OUTER JOIN
       ( select 
    source,
    count(distinct(sku)) as no_gap
    from product_gaps where 
    product_gap='no' and
    ingestion_date <= '2021-05-25' 
    /* aggregate by source */
    group by source ) r2
       USING ( source ) 
       ORDER BY source;