Calculate customer activity in specific periods (example : 7 days)

271 Views Asked by At

I have managed to calculate if a customer is active in a monthly period, and not active in the next period (churn) using CTEs. This has proven to be very straight forward so far. My code snippet used to do so (for others looking around on how to do this) is the following. My dwh.marts.fact_customer_kpi table has records representing that a customer has been active, and that means that he/she has spent some money using services.

with monthly_usage as (
  select
    userid as who_identifier,
    datediff(month, '1970-01-01', date) as time_period,
    date_part(mon,date) as month,
    date_part(yr,date) as year,
    CAST(
      CAST(date_part(yr,date) AS VARCHAR(4)) +
      RIGHT('0' + CAST(date_part(mon,date) AS VARCHAR(2)), 2) +
      RIGHT('0' + CAST(1 AS VARCHAR(2)), 2)
   AS DATETIME)as day
  from dwh.marts.fact_customer_kpi as k
          inner join dwh.marts.dim_user as u on u.user_id = k.userid
  where 
  kpi = 'ACTIVE' and (datediff(month, CURRENT_DATE, registration_date)*-1) > 1 group by 1,2,3,4,5 order by 1,2,3,4,5)
,

lag_lead as (
  select who_identifier,
  time_period,
  year,
  month,
  day,
    lag(time_period,1) over (partition by who_identifier order by who_identifier, time_period),
    lead(time_period,1) over (partition by who_identifier order by who_identifier, time_period)
  from monthly_usage)

,

lag_lead_with_diffs as (
  select who_identifier,
    year,
    month,
    day,
    time_period,
    lag,
    lead,
    time_period-lag lag_size,
    lead-time_period lead_size
  from lag_lead)
,

calculated as (
select time_period,
  year,
  month,
  day,
  case when lag is null then 'NEW ACTIVE'
     when lag_size = 1 then 'ACTIVE'
     when lag_size > 1 then 'REACTIVATED'
  end as this_month_value,
  case when (lead_size > 1 OR lead_size IS NULL) then 'CHURN'
     else NULL
  end as next_month_churn,
  who_identifier,
  count(who_identifier) as countIdentifier
   from lag_lead_with_diffs group by 1,2,3,4,5,6,7)

select time_period,
    day,
  this_month_value,
  who_identifier,
  next_month_churn,
  sum(countIdentifier) as countIdentifier
  from calculated  group by 1,2,3,4,5
union
  select time_period+1,
  dateadd(month,1,day),
  'CHURN',
  who_identifier,
  next_month_churn,
  countIdentifier
  from calculated where next_month_churn is not null
order by 1;

However, now I am wondering if there is an efficient way in Redshift to calculate periods based on specific dates. Say for example, calculate the same above, however instead of monthly, based on 7 day periods from when the customer registered.

The changes required in my query would be required in monthly_usage. I have tried using - interval '7 days' but without success so far, or I am missing something.

Can anyone point me to what I am missing (ideally with an example), or what changes would be required?

I am using Amazon Redshift.

1

There are 1 best solutions below

6
On

Are you missing date_trunc function? Because it feels like it.

You can replace this:

    CAST(
      CAST(date_part(yr,date) AS VARCHAR(4)) +
      RIGHT('0' + CAST(date_part(mon,date) AS VARCHAR(2)), 2) +
      RIGHT('0' + CAST(1 AS VARCHAR(2)), 2)
   AS DATETIME)as day

You can do this:

date_trunc('month', date)

Then I guess parametrize this in some nice language, and easily swap out for other dateparts. I would probably also swap out datediff(month, '1970-01-01', date) for EXTRACT