Calculate retention rate split by year

Question

Calculate retention rate split by year

297 Views Asked by Lebowski At 17 August 2025 at 19:06

Calculate retention rate / churn by year split

Dear Community, I am working on a data mining project where I would like to transform prior thinking from excel into R.

I have a customer database with contracts data and would like to calculate the retention rate. I was playing around with these library(lubridate); library(reshape2); library(plyr) but I couldn't figure it out how it works in R.

I have data like this:

ID    Customer        START          END
 1       Tesco   01-01-2000   31-12-2000
 2       Apple   05-11-2001   06-02-2002
 3         H&M   01-02-2002   08-05-2002
 4        Tesco  01-01-2001   31-12-2001
 5       Apple   01-01-2003   31-12-2004

I now was thinking of splitting the data into the Years (df2000, df2001) and then look it up again if the customer name exists in the main table (if yes return 1).

A result could look like this:

Customer     2000    2001    2002  2003   Retention Rate
Tesco         1        1      0     0          0.5
Apple         0        1      0     1
H&M           0        0      1     0

Original Q&A

There are 1 best solutions below

**Ronak Shah** · Accepted Answer

Using dplyr, you can try to get year value from each START date, count number of entries for each Customer and year, calculate the retention rate and spread the data to wide format.

library(dplyr)
df %>%
  mutate(year = format(as.Date(START, format = "%d-%m-%Y"), "%Y")) %>%
  dplyr::count(Customer, year) %>%
  group_by(Customer) %>%
  mutate(ret = n()/n_distinct(.$year))  %>%
  tidyr::spread(year, n, fill = 0) 

#  Customer   ret  `2000` `2001` `2002` `2003`
#  <fct>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#1 Apple     0.5       0      1      0      1
#2 H&M       0.25      0      0      1      0
#3 Tesco     0.5       1      1      0      0

EDIT

To consider data with Fiscal year instead from Oct-Sep we can do

library(lubridate)

df %>%
  mutate(START = dmy(START), 
         START = if_else(month(START) >= 10, START + years(1), START),
         year = year(START)) %>%
  dplyr::count(Customer, year) %>%
  group_by(Customer) %>%
  mutate(ret = n()/n_distinct(.$year))  %>%
  tidyr::spread(year, n, fill = 0)

data

df <- structure(list(ID = 1:5, Customer = structure(c(3L, 1L, 2L, 3L, 
1L), .Label = c("Apple", "H&M", "Tesco"), class = "factor"), 
START = structure(c(1L, 5L, 4L, 2L, 3L), .Label = c("01-01-2000", 
"01-01-2001", "01-01-2003", "01-02-2002", "05-11-2001"), class = "factor"), 
END = structure(c(3L, 1L, 2L, 4L, 5L), .Label = c("06-02-2002", 
"08-05-2002", "31-12-2000", "31-12-2001", "31-12-2004"), class = "factor")), 
class = "data.frame", row.names = c(NA, -5L))

Calculate retention rate split by year

There are 1 best solutions below

Related Questions in R

Related Questions in DATABASE

Related Questions in SPLIT

Related Questions in RETENTION

Related Questions in CHURN

Trending Questions

Popular # Hahtags

Popular Questions