I'm using google bigquery and try to understand monthly churn from the following table, e.g. 1 user (C) churned, and 1 user (D) is new.
My thought was to self join and see which user shows 'null' in current month as a churn, 'null' in previous month is new. But the query returns to "left" join result when I'm using "full join". How should I fix it? I'll be open to other approach as well! thank you!
userid | active_month |
---|---|
A | Jan 2022 |
B | Jan 2022 |
D | Jan 2022 |
A | Dec 2021 |
B | Dec 2021 |
C | Dec 2021 |
A | Nov 2021 |
B | Nov 2021 |
select
t1.active_month,
count(distinct t1.userid) recent_user,
count(distinct t2.userid) previous_user,
count(distinct case when t1.userid is null then t2.userid end) as churned_user,
count(distinct case when t2.user is null then t1.userid end) as new_user
from table t1
full outer join table t2
on t1.active_month = datesub(t2.active_month, interval 1 month)
and t1.userid=t2.userid
The two problems I see with your query are:
I have slightly changed your query and added one more condition in the calculation of churned_user to exclude the t2.userid with t2.active_month equal to the most recent active_month. Now it should be correct.
Just as an alternative approach the following would also calculate for each month:
Window functions LAG and LEAD come in handy for calculating the previous and next active month for each userid