I have a dataframe with two dates, among other things. I need to filter out rows that have more than two working days difference between these two dates. I must take into consideration weekends and holidays.
*Assuming 10/17/2023 is a holiday...
Example df:
NAME DATE1 DATE2
CASE1 10/12/2023 10/13/2023 <--- one day difference
CASE2 10/12/2023 10/16/2023 <--- two days difference (weekend)
CASE3 10/12/2023 10/18/2023 <--- three days difference (weekends and holidays)
...
CASEX 10/12/2023 10/19/2023 <--- four days difference (weekends and holidays)
I need to save CASE3 and CASEX (which has more than two days difference) in another dataframe and delete it from this one.
My approach:
date1 = "10/12/2023"
date2 = "10/19/2023"
date1 = pd.to_datetime(date1, format="%m/%d/%Y").date()
date2 = pd.to_datetime(date2, format="%m/%d/%Y").date()
holidays = [pd.to_datetime("10/17/2023",format="%m/%d/%Y").date()]
days = np.busday_count(date1, date2, holidays=holidays)
In "days" I have the correct number. But I don't get to implement it in dataframe to filter it and extract the rows.
Code
Define a list of holidays
Parse the strings in date columns to datetime type
Cast the dates to
datetime64[D]types then usebusy_daycount to get the diffUse boolean indexing to filter the rows