Google bigquery (GBQ) churn / retention rate from previous month

480 Views Asked by At

I'm using google bigquery and try to understand monthly churn from the following table, e.g. 1 user (C) churned, and 1 user (D) is new.

My thought was to self join and see which user shows 'null' in current month as a churn, 'null' in previous month is new. But the query returns to "left" join result when I'm using "full join". How should I fix it? I'll be open to other approach as well! thank you!

userid active_month
A Jan 2022
B Jan 2022
D Jan 2022
A Dec 2021
B Dec 2021
C Dec 2021
A Nov 2021
B Nov 2021
select
t1.active_month,
count(distinct t1.userid) recent_user,
count(distinct t2.userid) previous_user,
count(distinct case when t1.userid is null then t2.userid end) as churned_user,
count(distinct case when t2.user is null then t1.userid end) as new_user
from table t1
full outer join table t2
on t1.active_month = datesub(t2.active_month, interval 1 month)
and t1.userid=t2.userid
1

There are 1 best solutions below

5
On

The two problems I see with your query are:

  • the join clause should be reversed, so subtracting 1 month from t1.active_date instead of t2.active_date
  • when you select the t1.active_month you need to coalesce with t2.active_month because it's a full outer join

I have slightly changed your query and added one more condition in the calculation of churned_user to exclude the t2.userid with t2.active_month equal to the most recent active_month. Now it should be correct.

select
coalesce(t1.active_month, t2.active_month) as active_month,
count(distinct t1.userid) recent_user,
count(distinct IF(t2.userid is not null and t2.active_month is not null and t2.active_month <> (select max(active_month) from table), t2.userid, NULL)) previous_user,
count(distinct case when t1.userid is null and t2.active_month <> (select max(active_month) from table) then t2.userid end) as churned_user,
count(distinct case when t2.userid is null then t1.userid end) as new_user
from table t1
full outer join table t2
on date_sub(t1.active_month, interval 1 month) = t2.active_month and t1.userid = t2.userid
group by 1
order by 1

Just as an alternative approach the following would also calculate for each month:

  • recent_user: as the number of active users for a month
  • previous_user: as the number of active users for the previous month
  • churned_user: as the number of users that were last active on a month
  • new_user: as the number of users that are first seen on a month

Window functions LAG and LEAD come in handy for calculating the previous and next active month for each userid

WITH
  table AS (
  SELECT
    MAX(active_month) OVER() AS current_month,
    userid,
    active_month,
    LAG(active_month) OVER(PARTITION BY userid ORDER BY active_month) AS previous_month,
    LEAD(active_month) OVER(PARTITION BY userid ORDER BY active_month) AS next_month
  FROM
    original_table
  ORDER BY
    userid,
    active_month)
SELECT
  active_month,
  COUNT(userid) AS recent_user,
  COUNTIF(DATE_DIFF(active_month, previous_month, MONTH) = 1) AS previous_user,
  COUNTIF(next_month IS NULL AND active_month <> current_month) AS churned_user,
  COUNTIF(previous_month IS NULL) AS new_user
FROM
  TABLE
GROUP BY
  active_month
ORDER BY
  active_month