Conditional count of rows where at least one peer qualifies

93 Views Asked by At

Background

I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t:

+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1        |0      |
|a |1        |1      |
|b |0        |1      |
|c |1        |0      |
|c |0        |1      |
|c |1        |1      |
+--+---------+-------+

The Problem

I didn't explain myself well initially, so I've rewritten the goal.

Desired result:

+-----------------------+-----+
|ever treated           |count|
+-----------------------+-----+
|0                      |1    |
|1                      |3    |
+-----------------------+-----+

First, identify id that have ever been treated. Being "ever treated" means having any row with treatment = 1.

Second, count rows with outcome = 1 for each of those two groups. From my original table, the ids who are "ever treated" have a total of 3 outcome = 1, and the "never treated", so to speak, have 1 `outcome = 1.

What I've tried

I can get much of the way there, I think, with something like this:

select treatment, count(outcome)
from t
group by treatment;

But that only gets me this result:

+---------+-----+
|treatment|count|
+---------+-----+
|0        |2    |
|1        |4    |
+---------+-----+
2

There are 2 best solutions below

1
On BEST ANSWER

For the updated question:

SELECT ever_treated, sum(outcome_ct) AS count
FROM  (
   SELECT id
        , max(treatment) AS ever_treated
        , count(*) FILTER (WHERE outcome = 1) AS outcome_ct
   FROM   t
   GROUP  BY 1
   ) sub
GROUP  BY 1;
 ever_treated | count 
--------------+-------
            0 |     1
            1 |     3

db<>fiddle here

Read:

  • For those who got no treatment at all (all treatment = 0), we see 1 x outcome = 1.
  • For those who got any treatment (at least one treatment = 1), we see 3 x outcome = 1.

Would be simpler and faster with proper boolean values instead of integer.

2
On

(Answer to updated question)

here is an easy to follow subquery logic that works with integer:

    select subq.ever_treated, sum(subq.count) as count
    from (select id, max(treatment) as ever_treated, count(*) as count 
          from t where outcome = 1 
          group by id) as subq 
    group by subq.ever_treated;