How to shift a column of numbers in a DataFrame by specified time lag in python

70 Views Asked by At

In my DataFrame, there might be missing dates (such as weekends, holidays etc.), so the dates are not strictly continuous, and the shift has to be the latest available. For example, if the date is 2023-09-08, lag is 3, while 2023-09-05 is not available, then look for 2023-09-04 and till an available number is found. I might want to shift the column by days, months and years.

values = range(1000)
begin_date = '2022-01-01'

df = pd.DataFrame({'value':values, 
                   'date':pd.date_range(begin_date, periods=len(values))})
df.set_index('date', inplace=True)

# make the dates incontinuous
df.drop(df.loc[(df['value']>40) & (df['value']<50)].index, inplace=True)

print(df)

I can use following:

df['new_value'] = df['value'].shift(3, freq='D')

But it relies on that the date column has to be continuous, and I cannot get freq='M'/'m' working. The error is: ValueError: cannot reindex on an axis with duplicate labels.

Any ideas? Or I have to do it by brute force?

This is slightly different from other posted question.

0

There are 0 best solutions below