Pandas fill forward and sum as you go

2.7k Views Asked by At

I have a sparse dataframe including dates of when inventory is bought or sold like the following:

Date         Inventory
2017-01-01       10 
2017-01-05       -5
2017-01-07       15
2017-01-09      -20

First step I would like to solve is to to add in the other dates. I know you can use resample but just highlighting this part in case it has an impact on the next more difficult part. As below:

Date         Inventory
2017-01-01       10
2017-01-02       NaN
2017-01-03       NaN
2017-01-04       NaN
2017-01-05       -5
2017-01-06       NaN
2017-01-07       15
2017-01-08       NaN
2017-01-09      -20

The final step is to have it fill forward over the NaNs except that once it encounters a new value that get added to the current value of the row above, so that the final dataframe looks like the following:

Date         Inventory
2017-01-01       10
2017-01-02       10
2017-01-03       10
2017-01-04       10
2017-01-05       5
2017-01-06       5
2017-01-07       20
2017-01-08       20
2017-01-09       0
2017-01-10       0

I am trying to get a pythonic approach to this and not a loop based approach as that will be very slow.

The example should also work for a table with multiple columns as such:

Date         InventoryA       InventoryB
2017-01-01       10              NaN
2017-01-02       NaN             NaN
2017-01-03       NaN              5
2017-01-04       NaN              5
2017-01-05       -5              NaN
2017-01-06       NaN             -10
2017-01-07       15              NaN
2017-01-08       NaN             NaN
2017-01-09      -20              NaN

would become:

Date         InventoryA     InventoryB
2017-01-01       10             0
2017-01-02       10             0
2017-01-03       10             5
2017-01-04       10            10
2017-01-05       5             10
2017-01-06       5              0
2017-01-07       20             0
2017-01-08       20             0
2017-01-09       0              0
2017-01-10       0              0

hope that helps too. I think the current solution will have a problem with the nans as such.

3

There are 3 best solutions below

0
On

You're simply doing the two steps in the wrong order :)

df['Inventory'].cumsum().resample('D').pad()

Edit: you might need to set the Date as index first.

df = df.set_index('Date')
0
On

Part 1 : Assuming df is your

Date         Inventory
2017-01-01       10 
2017-01-05       -5
2017-01-07       15
2017-01-09      -20

Then

import pandas as pd
import datetime

df_new = pd.DataFrame([df.Date.min() + datetime.timedelta(days=day) for day in range((df.Date.max() - df.Date.min()).days+1)])
df_new = df_new.merge(df, left_on=0, right_on='Date',how="left").drop("Date",axis=1)
df_new.columns = df.columns

Gives you :

    Date    Inventory
0   2017-01-01  10.0
1   2017-01-02  NaN
2   2017-01-03  NaN
3   2017-01-04  NaN
4   2017-01-05  -5.0
5   2017-01-06  NaN
6   2017-01-07  15.0
7   2017-01-08  NaN
8   2017-01-09  -20.0

part 2

From fillna method descriptions:

method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap

df_new.Inventory = df_new.Inventory.fillna(method="ffill")

Gives you

    Date    Inventory
0   2017-01-01  10.0
1   2017-01-02  10.0
2   2017-01-03  10.0
3   2017-01-04  10.0
4   2017-01-05  -5.0
5   2017-01-06  -5.0
6   2017-01-07  15.0
7   2017-01-08  15.0
8   2017-01-09  -20.0

You should be able to generalise it for more than one column once you understood how it can be done with one.

0
On

You can just fill the missing values with 0 after resampling (no inventory change on that day), and then use cumsum

df.fillna(0).cumsum()