How to claculate cumulative credit balance in R

43 Views Asked by At

At my company, we give customers products on credit which has to be paid off by the warranty expiration date of the product. At the purchase time, we record the clientID, the product category, the date of purchase, the amount of credit, and the product's warranty expiration date. Clients can repay in as many installments as they want during the repayment period. When the client makes a repayment, we record their ID, the amount repaid, and the date of repayment.

The code below generates 2 dummy dataframes that simulate the datasets I am working with:

# Set the seed for reproducibility
set.seed(123)

# Define the number of rows for each dataframe
n_purchase <- 20
n_repayment <- 20

# Define the possible values for each column
clientID <- 101:110
product_category <- LETTERS[1:3]
purchase_date <- seq(as.Date("2020-01-01"), as.Date("2020-12-31"), by = "day")
credit_amount <- seq(1000, 2000, by = 100)
warranty_expiration_date <- purchase_date + years(1) + days(14)
repayment_amount <- seq(500, 1000, by = 50)
repayment_date <- purchase_date

# Generate the purchase_df dataframe
purchase_df <- tibble(
  clientID = sample(clientID, n_purchase, replace = TRUE),
  product_category = sample(product_category, n_purchase, replace = TRUE),
  purchase_date = sample(purchase_date, n_purchase, replace = FALSE),
  credit_amount = sample(credit_amount, n_purchase, replace = TRUE),
  warranty_expiration_date = purchase_date + years(1) + days(14)
)
# Generate the repayment_df dataframe
repayment_df <- tibble(
  clientID = sample(clientID, n_repayment, replace = TRUE),
  repayment_amount = sample(repayment_amount, n_repayment, replace = TRUE),
  repayment_date = sample(repayment_date, n_repayment, replace = FALSE)
)

Here is the complication; a client can come back and take another product while the previous credit is still active and when they make a repayment, we don't record for which product they are paying, we just consider their total outstanding credit and substract the amount just repaid.

After many years of operation, I would like to know which clients have been paying regularly and who represent a non-repayment risk.

My challenge now is, for each record in the purchase dataframe, add a column that holds the credit balance for that client at that date. Here is how I am trying to do it:

purchase_df %<>% full_join(repayment_df, by = "ClientID") %>% 
  filter(RepaymentDate %within% interval(purchaseDate, WarrantExpirationDate)) %>% 
  group_by(across(all_of(colnames(purchase_df)))) %>% 
  summarize(RepaidAmount = sum(RepaidAmount)/n()) %>% ungroup() 

This would give me a new column with the amount paid. If the client made 5 repayments and took 3 products, then I would add together the amount repaid and distribute it to all products (divide by 3).

My problem is that these different products have different warranty periods. So, it's difficult to know how to calculate the balance. Our business model is such that the assumption is that a repayment always substracts from the the oldest not repaid product first.

Any help with a code that can help me achieve this?

1

There are 1 best solutions below

1
Merijn van Tilborg On

A bit quick and dirty as I did not have a lot of time this week, however regarding your reply and providing sample data you could use the following approach. I did it in data.table but if you really want to stick to dplyr you can most likely find alternatives for the same approach.

library(data.table)

# make them a data.table
setDT(repayment_df)
setDT(purchase_df)

# order the data
setorder(purchase_df, clientID, purchase_date)
setorder(repayment_df, clientID, repayment_date)

# create an "event" date to merge on
x <- purchase_df[, date := purchase_date]
y <- repayment_df[, date := repayment_date]

z <- merge(x, y, by = c("clientID", "date"), all = T)
setnafill(z, fill = 0L, cols = c("credit_amount", "repayment_amount"))

z[, balance := cumsum(-shift(credit_amount, type = "lag", fill = 0L) + repayment_amount) - credit_amount, clientID]

z[, repayed_cum := cumsum(repayment_amount), clientID]
z[, credits_cum := cumsum(credit_amount), clientID]

z[!is.na(purchase_date), repayed := credits_cum < max(repayed_cum), clientID]

output

R: str(z)
Classes ‘data.table’ and 'data.frame':  40 obs. of  12 variables:
 $ clientID                : int  102 102 102 103 103 103 103 103 103 103 ...
 $ date                    : Date, format: "2020-07-02" "2020-07-14" "2020-10-16" "2020-02-10" ...
 $ product_category        : chr  NA NA "B" "C" ...
 $ purchase_date           : Date, format: NA NA "2020-10-16" "2020-02-10" ...
 $ credit_amount           : num  0 0 1700 1200 0 1500 1300 0 0 0 ...
 $ warranty_expiration_date: Date, format: NA NA "2021-10-30" "2021-02-24" ...
 $ repayment_amount        : num  1000 700 0 0 750 0 0 850 500 850 ...
 $ repayment_date          : Date, format: "2020-07-02" "2020-07-14" NA NA ...
 $ balance                 : num  1000 1700 0 -1200 -450 -1950 -3250 -2400 -1900 -1050 ...
 $ repayed_cum             : num  1000 1700 1700 0 750 750 750 1600 2100 2950 ...
 $ credits_cum             : num  0 0 1700 1200 1200 2700 4000 4000 4000 4000 ...
 $ repayed                 : logi  NA NA FALSE TRUE NA TRUE ...
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "sorted")= chr [1:2] "clientID" "date"

notes

Your expected outcome is not 100% clear to me, nor at what date you want to take as reference to check if one is overdue nor what you consider a risk.

From above you at least know the balance outstanding at any new purchase and you know what purchases have been fully paid off.