Calculate Retention Rate on one column in R

233 Views Asked by At

I would need your advice as I am struggling to find out the right command in R.

Basically I would like to calculate the retention rate for the specific customers. The customer_math is the snapshot of when the customer was active, which includes a time range of 8 years.

customer  customer_math
Apple          1
Tesco          10
Nespresso      1001
Dell           11
BMW            11111100

The final dataset should look like this:

customer  customer_math      retention_rate
Apple          1                1
Tesco          10               0.5
Nespresso      1001             0.5
Dell           11               1
BMW            11111100         0.75

Any ideas of how I can solve my problem?

Your help is very appreciated! Thanks!

3

There are 3 best solutions below

1
On BEST ANSWER
library(tidyverse)
tribble(
    ~customer, ~customer_math,
      "Apple",              1,
      "Tesco",             10,
  "Nespresso",           1001,
       "Dell",             11,
        "BMW",       11111100
  ) %>%
  mutate(active_count = str_count(customer_math, "1"),
         periods = str_length(customer_math),
         retention_rate = active_count / periods)

## A tibble: 5 x 5
#  customer  customer_math active_count periods retention_rate
#  <chr>             <dbl>        <int>   <int>          <dbl>
#1 Apple                 1            1       1           1   
#2 Tesco                10            1       2           0.5 
#3 Nespresso          1001            2       4           0.5 
#4 Dell                 11            2       2           1   
#5 BMW            11111100            6       8           0.75
0
On

You can remove all the 0's in the string, calculate nchar and divide it by total nchar.

df$retention_rate <- with(df, nchar(gsub('0', '', customer_math, fixed = TRUE))/
                              nchar(customer_math))
df
#   customer customer_math retention_rate
#1     Apple             1           1.00
#2     Tesco            10           0.50
#3 Nespresso          1001           0.50
#4      Dell            11           1.00
#5       BMW      11111100           0.75

data

df <- structure(list(customer = structure(c(1L, 5L, 4L, 3L, 2L), 
.Label = c("Apple", "BMW", "Dell", "Nespresso", "Tesco"), class = "factor"), 
customer_math = c(1L, 10L, 1001L, 11L, 11111100L)), class = "data.frame", 
row.names = c(NA, -5L))
0
On

Another Base R solution achieving the desired result:

# Coerce customer_math vector to a character type to enable 
# the string split, loop through each element: 

    df$retention_rate <- sapply(as.character(df$customer_math), 

           function(x){

             # Split each element up into a vector comrpised of
             # each of the characters: 

             elements_split <- unlist(strsplit(x, ""))

             # Divide the sum of each of these vectors by their length: 

             rr <- sum(as.numeric(elements_split))/length(elements_split)

             # Explicitly return the above vector: 

             return(rr)
      }
    )

Data:

df <- structure(
  list(
    customer = structure(
      c(1L, 5L, 4L, 3L, 2L),
      .Label = c("Apple", "BMW", "Dell", "Nespresso", "Tesco"),
      class = "factor"
    ),
    customer_math = c(1L, 10L, 1001L, 11L, 11111100L)
  ),
  class = "data.frame",
  row.names = c(NA,-5L)
)