I'm trying to calculate customer churn on a monthly basis using the following data. However, unfortunately I do not have a boolean value which takes the value 1/True or 0/False when customers churned or not. You can assume that when a customer ordered in a certain month he/she has'nt churned and when he/she did not order in that month he/she is churned. I do understand that this results in fluctuations of the churn rate but that does not matter right now.
date | CustomerID | Items |
---|---|---|
2017-11-07 19:06:43 | 00001 | Bread, Milk |
2017-11-07 20:06:43 | 00002 | Dough |
2017-12-07 21:06:43 | 00003 | Apples |
2018-01-07 21:06:43 | 00002 | Carrots |
2018-01-07 21:06:43 | 00001 | Keyboard, Soymilk |
2018-02-07 21:06:43 | 00003 | Pie |
2018-03-07 21:06:43 | 00002 | Water |
2018-03-07 21:06:43 | 00003 | Chicken |
2018-04-07 21:06:43 | 00004 | Chewing Gum |
I tried resampling them by month using
df_monthly = df.resample('M', on='date').count()
Subsequently I have the number of orders per month, but I do not know who churned. I hope someone can help me around with this.
Thanks a lot :)
IIUC, group each row by month and convert
CustomerID
column to a set:Now you have to find the difference between two sets: