How to calculate churn based on customer ID / emails?

192 Views Asked by At

I'm trying to calculate customer churn on a monthly basis using the following data. However, unfortunately I do not have a boolean value which takes the value 1/True or 0/False when customers churned or not. You can assume that when a customer ordered in a certain month he/she has'nt churned and when he/she did not order in that month he/she is churned. I do understand that this results in fluctuations of the churn rate but that does not matter right now.

date CustomerID Items
2017-11-07 19:06:43 00001 Bread, Milk
2017-11-07 20:06:43 00002 Dough
2017-12-07 21:06:43 00003 Apples
2018-01-07 21:06:43 00002 Carrots
2018-01-07 21:06:43 00001 Keyboard, Soymilk
2018-02-07 21:06:43 00003 Pie
2018-03-07 21:06:43 00002 Water
2018-03-07 21:06:43 00003 Chicken
2018-04-07 21:06:43 00004 Chewing Gum

I tried resampling them by month using

    df_monthly = df.resample('M', on='date').count()

Subsequently I have the number of orders per month, but I do not know who churned. I hope someone can help me around with this.

Thanks a lot :)

1

There are 1 best solutions below

1
On

IIUC, group each row by month and convert CustomerID column to a set:

out = df.groupby(pd.Grouper(freq='M', key='date'))['CustomerID'].apply(set)
print(out)

# Output
date
2017-11-30    {00002, 00001}
2017-12-31           {00003}
2018-01-31    {00002, 00001}
2018-02-28           {00003}
2018-03-31    {00002, 00003}
2018-04-30           {00004}
Freq: M, Name: CustomerID, dtype: object

Now you have to find the difference between two sets:

out = out.to_frame().assign(CustomerIDPrev=out.shift(fill_value=set())) \
         .apply(lambda x: x.iloc[1].difference(x.iloc[0]), axis=1).tolist()
print(out)

# Output
date
2017-11-30                {}
2017-12-31    {00002, 00001}
2018-01-31           {00003}
2018-02-28    {00002, 00001}
2018-03-31                {}
2018-04-30    {00002, 00003}
Freq: M, dtype: object