Fill in missing dates and add lists together from column by row and front fill in pandas dataframe

40 Views Asked by At

I'd like to fill in the following dataframe with dates between a range and front fill all the columns. As I am completing this, I would like to append the lists in the Wells column so it becomes continues to add items by date.

Dataframe to expand by date

   StartDate     Wells                     count Sum_Cumm   vol
0   1967-10-01  [MUN-523, MUN-354, MUN-2660] 50   50    8503.323620
1   1968-01-01  [MUN-152]                     1   51    8336.591784
2   1968-03-01  [MUN-1032]                    1   52    8176.272712
3   1968-10-01  [MUN-16128]                   1   53    9191.110200

Code that I'm working on

newdf = (newdf.set_index('StartDate').reindex(pd.date_range('10-01-1967', '12-31-1994', freq='MS')).rename_axis(['StartDate']).reset_index()).ffill(newdf['vol'])

Dataframe that I'd like to end up with

 StartDate   Wells                                        count Sum_Cumm    vol
0   1967-10-01  [MUN-523, MUN-354, MUN-2660]                     50   50    8503.323620
1   1967-11-01  [MUN-523, MUN-354, MUN-2660]                     1    51    8503.323620    
2   1967-12-01  [MUN-523, MUN-354, MUN-2660]                     1    51    8503.323620
3   1968-01-01  [MUN-523, MUN-354, MUN-2660,MUN-152]             1    52    8336.591784
4   1968-02-01  [MUN-523, MUN-354, MUN-2660,MUN-152]             1    53    8336.591784
5   1968-03-01  [MUN-523, MUN-354, MUN-2660,MUN-152,MUN-1032]    1    53    8176.272712
6   1968-04-01  [MUN-523, MUN-354, MUN-2660,MUN-152,MUN-1032]    1    53    8176.272712
1

There are 1 best solutions below

0
SomeDude On BEST ANSWER

You can use period_range to create a new index and reindex the existing df to create a new dataframe and then ffill.

For Wells column do a cumsum and then apply np.unique

new_idx = pd.to_datetime(
    pd.period_range(df["StartDate"].min(), df["StartDate"].max(), freq="M")
    .asfreq("D", how="S")
    .strftime("%Y-%m-%d")
)
df2 = df.set_index("StartDate").reindex(new_idx)
df2 = df2.ffill(downcast="infer")
df2["Wells"] = df2["Wells"].cumsum().apply(np.unique)
df2 = df2.rename_axis("StartDate").reset_index()

print(df2)

    StartDate                                              Wells  count  \
0  1967-10-01                       [MUN-2660, MUN-354, MUN-523]     50   
1  1967-11-01                       [MUN-2660, MUN-354, MUN-523]     50   
2  1967-12-01                       [MUN-2660, MUN-354, MUN-523]     50   
3  1968-01-01              [MUN-152, MUN-2660, MUN-354, MUN-523]      1   
4  1968-02-01              [MUN-152, MUN-2660, MUN-354, MUN-523]      1   
5  1968-03-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
6  1968-04-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
7  1968-05-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
8  1968-06-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
9  1968-07-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
10 1968-08-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
11 1968-09-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
12 1968-10-01  [MUN-1032, MUN-152, MUN-16128, MUN-2660, MUN-3...      1   

    Sum_Cumm          vol  
0         50  8503.323620  
1         50  8503.323620  
2         50  8503.323620  
3         51  8336.591784  
4         51  8336.591784  
5         52  8176.272712  
6         52  8176.272712  
7         52  8176.272712  
8         52  8176.272712  
9         52  8176.272712  
10        52  8176.272712  
11        52  8176.272712  
12        53  9191.110200