Add business days to pandas dataframe with dates and skip over holidays python

9k Views Asked by At

I have a dataframe with dates as seen in the table below. 1st block is what it should look like and the 2nd block is what I get when just adding the BDays. This is an example of what it should look like when completed. I want to use the 1st column and add 5 business days to the dates, but if the 5 Bdays overlaps a holiday (like 15 Feb'21) then I need to add one additional day. It is fairly simple to add the 5Bday using pandas.tseries.offsets import BDay, but i cannot skip the holidays while using the dataframe.

I have tried to use pandas.tseries.holiday import USFederalHolidayCalendar, the workdays and workalendar modules, but cannot figure it out. Anyone have an idea what I can do.

Correct Example

DATE EXIT DATE +5
2021/02/09 2021/02/17
2021/02/10 2021/02/18

Wrong Example

DATE EXIT DATE +5
2021/02/09 2021/02/16
2021/02/10 2021/02/17

Here are some examples of code I tried:

import pandas as pd
from workdays import workday
...
df['DATE'] = workday(df['EXIT DATE +5'], days=5, holidays=holidays)

Next Example:

import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
bday_us = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
dt = df['DATE']
df['EXIT DATE +5'] = dt + bday_us

=========================================

Final code:

Below is the code I finally settled on. I had to define the holidays manually due to the days the NYSE actually trades. Like for instance the day Pres Bush was laid to rest.

import datetime as dt
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import BDay

from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
    USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
    USLaborDay, USThanksgivingDay

class USTradingCalendar(AbstractHolidayCalendar):
    rules = [
        Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
        USMartinLutherKingJr,
        USPresidentsDay,
        GoodFriday,
        USMemorialDay,
        Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
        Holiday('BushDay', year=2018, month=12, day=5),
        USLaborDay,
        USThanksgivingDay,
        Holiday('Christmas', month=12, day=25, observance=nearest_workday)
    ]

offset = 5

df = pd.DataFrame(['2019-10-11', '2019-10-14', '2017-04-13', '2018-11-28', '2021-07-02'], columns=['DATE'])
df['DATE'] = pd.to_datetime(df['DATE'])

def offset_date(start, offset):
  return start + pd.offsets.CustomBusinessDay(n=offset, calendar=USTradingCalendar())

df['END'] = df.apply(lambda x: offset_date(x['DATE'], offset), axis=1)
print(df)
2

There are 2 best solutions below

10
xicocaio On BEST ANSWER

Input data

df = pd.DataFrame(['2021-02-09', '2021-02-10', '2021-06-28', '2021-06-29', '2021-07-02'], columns=['DATE'])
df['DATE'] = pd.to_datetime(df['DATE'])

Suggested solution using apply

from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import BDay

def offset_date(start, offset):
  return start + pd.offsets.CustomBusinessDay(n=offset, calendar=USFederalHolidayCalendar())

offset = 5
df['END'] = df.apply(lambda x: offset_date(x['DATE'], offset), axis=1)

    DATE        END
    2021-02-09  2021-02-17
    2021-02-10  2021-02-18
    2021-06-28  2021-07-06
    2021-06-29  2021-07-07
    2021-07-02  2021-07-12

PS: If you want to use a particular calendar such as the NYSE, instead of the default USFederalHolidayCalendar, I recommend following the instructions on this answer, about creating a custom calendar.

Alternative solution which I do not recommend

Currently, to the best of my knowledge, pandas do not support a vectorized approach to your problem. But if you want to follow a similar approach to the one you mentioned, here is what you should do.

First, you will have to define an arbitrary far away end date that includes all the periods you might need and use it to create a list of holidays.

holidays = USFederalHolidayCalendar().holidays(start='2021-02-09', end='2030-02-09')

Then, you pass the holidays list to CustomBusinessDay through the holidays parameter instead of the calendar to generate the desired offset.

offset = 5
bday_us = pd.offsets.CustomBusinessDay(n=offset, holidays=holidays)
df['END'] = df['DATE'] + bday_us

However, this type of approach is not a true vectorized solution, even though it might seem like it. See the following SO answer for further clarification. Under the hood, this approach is probably doing a conversion that is not efficient. This why it yields the following warning.

PerformanceWarning: Non-vectorized DateOffset being applied to Series or DatetimeIndex

0
fthomson On

Here's one way to do it

import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from datetime import timedelta as td

def get_exit_date(date):
    holiday_list = cals.holidays(start=date, end=date + td(weeks=2)).tolist()
    # 6 periods since start date is included in set
    n_bdays = pd.bdate_range(start=date, periods=6, freq='C', holidays=holiday_list)
    return n_bdays[-1]

df = pd.read_clipboard()
cals = USFederalHolidayCalendar()
# I would convert this to datetime
df['DATE'] = pd.to_datetime(df['DATE'])
df['EXIT DATE +5'] = df['DATE'].apply(get_exit_date)

this is using bdate_range which returns a datetime index

Results:

    DATE    EXIT DATE +5
0   2021-02-09  2021-02-17
1   2021-02-10  2021-02-18

Another option is instead of dynamically creating the holiday list. You could also just choose a start date and leave it outside the function like so:

def get_exit_date(date):
    # 6 periods since start date is included in set
    n_bdays = pd.bdate_range(start=date, periods=6, freq='C', holidays=holiday_list)
    return n_bdays[-1]

df = pd.read_clipboard()
cals = USFederalHolidayCalendar()
holiday_list = cals.holidays(start='2021-01-01').tolist()
# I would convert this to datetime
df['DATE'] = pd.to_datetime(df['DATE'])
df['EXIT DATE +5'] = df['DATE'].apply(get_exit_date)