I have a dataframe that looks as follows:
ID Date Prize IfWon
1 01-01-20 5 1
2 01-01-20 8 1
1 01-03-20 3 0
1 01-04-20 10 1
1 01-07-20 5 0
2 01-10-20 5 1
3 01-10-20 10 1
I want to add a new column that, for a given ID will include the sum of any prizes they won in the 7 days prior to that date and NOT including that date. The goal is to have a dataframe that looks like this:
ID Date Prize IfWon PrevWon
1 01-01-20 5 1 0
2 01-01-20 8 1 0
1 01-03-20 3 0 5
1 01-04-20 10 1 5
1 01-07-20 5 0 15
2 01-10-20 5 1 0
3 01-10-20 10 1 0
The code I have to do it is below, and it works, but I'm having two issues with it:
def get_rolling_prize_sum(grp, freq):
return grp.rolling(freq, on = 'Date', closed = 'right')['CurrentWon'].sum()
processed_data_df['CurrentWon'] = processed_data_df['Prize'] * processed_data_df['IfWon'] # gets deleted later
processed_data_df['PrevWon'] = processed_data_df.groupby('ID', group_keys=False).apply(get_rolling_prize_sum, '7D').astype(float) - processed_data_df['CurrentWon']
- Because I don't want to include the prize from the current day, I was trying to close the rolling on the right side, however this isn't working (e.g. taking out closed = 'right' above will do the exact same thing). As a result of this, I end up doing the subtraction in the last line.
- The actual DB I'm using is huge and I need to do many of these rolling sums at various points, but it's stupidly slow. I was told that I can do this without .apply, using .rolling directly, but I'm unable to get that to work properly. My attempt is below, with error, and I'll note that the error took several minutes to produce with this being the only significant computation, so it seems as if it's doing part of it and then failing later:
# Not using closed right here, just subtracting
processed_data_df['PrevWon'] = processed_data_df.groupby('ID', group_keys=False).rolling('7D', on = 'Date')['CurrentWon'].sum() - processed_data_df['CurrentWon']
ValueError: cannot join with no overlapping index names
Any ideas?
Improved previous answer and managed to tackle sorting of groupby
Outputs: