This is my first stackoverflow post after years of passive browsing. I am stuck on this problem and it's driving me nuts! Thanks so much for helping.
I have a data frame of inventory supply and demand bucketed by age. I have collected this data across time for many products. Demand for inventory of a certain age can be met with supply of the same age or younger. I am trying to calculate how much demand can be met with supply within each age bucket, filled youngest to oldest.
The data frame will be large (10^7 rows) so I am trying to do this with dplyr
, mutate
, lag
, and cumsum
versus a loop, which I suspect will be slow.
Here is a sample group within my data set (product and date groupings omitted):
library(dplyr)
Inventory <- data.frame(
Age = c(90, 120, 270, 365, Inf),
Demand = c(0, 5000, 25, 5000, 10),
Supply = c(4000, 50, 4000, 300, 0))
View(Inventory)
The result I am expecting is:
Result <- Inventory
Result$Start = c(0, 4000, 0, 3975, 0)
Result$In = c(4000, 50, 4000, 300, 0)
Result$Out = c(0, 4050, 25, 4275, 0)
Result$End = c(4000, 0, 3975, 0, 0)
Result$Short = c(0, 950, 0, 725, 10)
View(Result)
I applied standard inventory calculations above:
- Start = End
- In = Supply
- Out = min(Demand, Start + In)
- End = Start + In - Out
- Short = Demand - Out
I am having no luck using dplyr, but I think there is a solution using clever combinations of max, min, lag, and cumsum.
If speed becomes an issue over a large number of rows then potentially the fastest way to deal with iterating calculations is via
Rcpp
.You essentially need a cummulative-sum-but-floored-at-zero function, which adds the Supply - Demand outcome of each day to the last total and zeros it if it's negative. Here's a
cumnominus
trial function which gives the right table and can be used indplyr
:As a bit of a test against an R-only loop on a dataframe of 5m rows it takes around 0.05s compared to R-loop of 8.5s: