Calculate retention rate / churn by year split
Dear Community, I am working on a data mining project where I would like to transform prior thinking from excel into R.
I have a customer database with contracts data and would like to calculate the retention rate.
I was playing around with these library(lubridate)
; library(reshape2)
; library(plyr)
but I couldn't figure it out how it works in R.
I have data like this:
ID Customer START END
1 Tesco 01-01-2000 31-12-2000
2 Apple 05-11-2001 06-02-2002
3 H&M 01-02-2002 08-05-2002
4 Tesco 01-01-2001 31-12-2001
5 Apple 01-01-2003 31-12-2004
I now was thinking of splitting the data into the Years (df2000, df2001) and then look it up again if the customer name exists in the main table (if yes return 1).
A result could look like this:
Customer 2000 2001 2002 2003 Retention Rate
Tesco 1 1 0 0 0.5
Apple 0 1 0 1
H&M 0 0 1 0
Using
dplyr
, you can try to getyear
value from eachSTART
date,count
number of entries for eachCustomer
andyear
, calculate the retention rate andspread
the data to wide format.EDIT
To consider data with Fiscal year instead from Oct-Sep we can do
data