I want to do a rolling sum based on different levels of the index but am struggling to make it a reality. Instead of explaining the problem am giving below the demo input and desired output along with the kind of insights am looking for.
So I have multiple brands and each of their sales of various item categories in different year month day grouped by as below. What I want is a dynamic rolling sum at each day level, rolled over a window on Year as asked.
for eg, if someone asks
Demo question 1) Till a certain day(not including that day) what were their last 2 years' sales of that particular category for that particular brand.
I need to be able to answer this for every single day i.e every single row should have a number as shown in Table 2.0.
I want to be able to code in such a way that if the question changes from 2 years to 3 years I just need to change a number. I also need to do the same thing at the month's level.
demo question 2) Till a certain day(not including that day) what was their last 3 months' sale of that particular category for that particular year for that particular brand.
Below is demo input
The tables are grouped by brand,category,year,month,day and sum of sales from a master table which had all the info and sales at hour level each day
Table 1.0
| Brand | Category | Year | Month | Day | Sales |
|---|---|---|---|---|---|
| ABC | Big Appliances | 2021 | 9 | 3 | 0 |
| Clothing | 2021 | 9 | 2 | 0 | |
| Electronics | 2020 | 10 | 18 | 2 | |
| Utensils | 2020 | 10 | 18 | 0 | |
| 2021 | 9 | 2 | 4 | ||
| 3 | 0 | ||||
| XYZ | Big Appliances | 2012 | 4 | 29 | 7 |
| 2013 | 4 | 7 | 6 | ||
| Clothing | 2012 | 4 | 29 | 3 | |
| Electronics | 2013 | 4 | 9 | 1 | |
| 27 | 2 | ||||
| 5 | 4 | 5 | |||
| 2015 | 4 | 27 | 7 | ||
| 5 | 2 | 2 | |||
| Fans | 2013 | 4 | 14 | 4 | |
| 5 | 4 | 0 | |||
| 2015 | 4 | 18 | 1 | ||
| 5 | 17 | 11 | |||
| 2016 | 4 | 12 | 18 | ||
| Furniture | 2012 | 5 | 4 | 1 | |
| 8 | 6 | ||||
| 20 | 4 | ||||
| 2013 | 4 | 5 | 1 | ||
| 7 | 8 | ||||
| 9 | 2 | ||||
| 2015 | 4 | 18 | 12 | ||
| 27 | 15 | ||||
| 5 | 2 | 4 | |||
| 17 | 3 | ||||
| Musical-inst | 2012 | 5 | 18 | 10 | |
| 2013 | 4 | 5 | 6 | ||
| 2015 | 4 | 16 | 10 | ||
| 18 | 0 | ||||
| 2016 | 4 | 12 | 1 | ||
| 16 | 13 | ||||
| Utencils | 2012 | 5 | 8 | 2 | |
| 2016 | 4 | 16 | 3 | ||
| 18 | 2 | ||||
| 2017 | 4 | 12 | 13 |
Below is desired output for demo question 1 based on the demo table(last 2 years cumsum not including that day)
Table 2.0
| Brand | Category | Year | Month | Day | Sales | Conditional Cumsum(till last 2 years) |
|---|---|---|---|---|---|---|
| ABC | Big Appliances | 2021 | 9 | 3 | 0 | 0 |
| Clothing | 2021 | 9 | 2 | 0 | 0 | |
| Electronics | 2020 | 10 | 18 | 2 | 0 | |
| Utensils | 2020 | 10 | 18 | 0 | 0 | |
| 2021 | 9 | 2 | 4 | 0 | ||
| 3 | 0 | 4 | ||||
| XYZ | Big Appliances | 2012 | 4 | 29 | 7 | 0 |
| 2013 | 4 | 7 | 6 | 7 | ||
| Clothing | 2012 | 4 | 29 | 3 | 0 | |
| Electronics | 2013 | 4 | 9 | 1 | 0 | |
| 27 | 2 | 1 | ||||
| 5 | 4 | 5 | 3 | |||
| 2015 | 4 | 27 | 7 | 8 | ||
| 5 | 2 | 2 | 15 | |||
| Fans | 2013 | 4 | 14 | 4 | 0 | |
| 5 | 4 | 0 | 4 | |||
| 2015 | 4 | 18 | 1 | 4 | ||
| 5 | 17 | 11 | 5 | |||
| 2016 | 4 | 12 | 18 | 12 | ||
| Furniture | 2012 | 5 | 4 | 1 | 0 | |
| 8 | 6 | 1 | ||||
| 20 | 4 | 7 | ||||
| 2013 | 4 | 5 | 1 | 11 | ||
| 7 | 8 | 12 | ||||
| 9 | 2 | 20 | ||||
| 2015 | 4 | 18 | 12 | 11 | ||
| 27 | 15 | 23 | ||||
| 5 | 2 | 4 | 38 | |||
| 17 | 3 | 42 | ||||
| Musical-inst | 2012 | 5 | 18 | 10 | 0 | |
| 2013 | 4 | 5 | 6 | 10 | ||
| 2015 | 4 | 16 | 10 | 6 | ||
| 18 | 0 | 16 | ||||
| 2016 | 4 | 12 | 1 | 10 | ||
| 16 | 13 | 11 | ||||
| Utencils | 2012 | 5 | 8 | 2 | 0 | |
| 2016 | 4 | 16 | 3 | 0 | ||
| 18 | 2 | 3 | ||||
| 2017 | 4 | 12 | 13 | 5 |
End thoughts:
The idea is to basically do a rolling window over year column maintaining the 2 years span criteria and keep on summing the sales figures.
P.S I really need a fast solution due to the huge data size and therefore created a .apply function row-wise which I didn't find feasible. A better solution by using some kind of group rolling sum or supporting columns will be really helpful.
Here I'm giving a sample solution for the above problem.
I have concidered just onr product so that the solution would be simple
Code:
Here Tot would output the required cumsum column for the given data. Output:
Here you can specify the Time span using Number of days in Limit variable. Hope this solves the problem you are looking for.