Python Pandas filter rows by days of difference in two columns with weekend and holidays

Question

Python Pandas filter rows by days of difference in two columns with weekend and holidays

74 Views Asked by Gustavo At 11 October 2023 at 03:24

I have a dataframe with two dates, among other things. I need to filter out rows that have more than two working days difference between these two dates. I must take into consideration weekends and holidays.

*Assuming 10/17/2023 is a holiday...

Example df:

NAME     DATE1         DATE2
CASE1    10/12/2023    10/13/2023  <--- one day difference
CASE2    10/12/2023    10/16/2023  <--- two days difference (weekend)
CASE3    10/12/2023    10/18/2023  <--- three days difference (weekends and holidays)
...
CASEX    10/12/2023    10/19/2023  <--- four days difference (weekends and holidays)

I need to save CASE3 and CASEX (which has more than two days difference) in another dataframe and delete it from this one.

My approach:

date1 = "10/12/2023"
date2 = "10/19/2023"
date1 = pd.to_datetime(date1, format="%m/%d/%Y").date()
date2 = pd.to_datetime(date2, format="%m/%d/%Y").date()
holidays = [pd.to_datetime("10/17/2023",format="%m/%d/%Y").date()]
days = np.busday_count(date1, date2, holidays=holidays)

In "days" I have the correct number. But I don't get to implement it in dataframe to filter it and extract the rows.

Original Q&A

There are 2 best solutions below

Panda Kim On 11 October 2023 at 05:45

Code

You can use the apply() function to count the number of bdate_range values in each row.

cond = df.apply(lambda x: len(pd.bdate_range(x['DATE1'], x['DATE2'], holidays=['2023-10-17'], freq='C')), axis=1).gt(3)
df[cond]

output:

    NAME    DATE1       DATE2
2   CASE3   10/12/2023  10/18/2023
3   CASE4   10/12/2023  10/19/2023

Example

import pandas as pd
data = {'NAME': ['CASE1', 'CASE2', 'CASE3', 'CASE4'], 
        'DATE1': ['10/12/2023', '10/12/2023', '10/12/2023', '10/12/2023'], 
        'DATE2': ['10/13/2023', '10/16/2023', '10/18/2023', '10/19/2023']}
df = pd.DataFrame(data)

df

    NAME    DATE1       DATE2
0   CASE1   10/12/2023  10/13/2023
1   CASE2   10/12/2023  10/16/2023
2   CASE3   10/12/2023  10/18/2023
3   CASE4   10/12/2023  10/19/2023

**Shubham Sharma** · Accepted Answer · 2023-10-11T04:11:34.050000

Code

Define a list of holidays

holidays = np.array([pd.to_datetime("10/17/2023", format="%m/%d/%Y")], dtype='datetime64[D]')

Parse the strings in date columns to datetime type

df['DATE1'] = pd.to_datetime(df['DATE1'], format="%m/%d/%Y")
df['DATE2'] = pd.to_datetime(df['DATE2'], format="%m/%d/%Y")

#     NAME      DATE1      DATE2
# 0  CASE1 2023-10-12 2023-10-13
# 1  CASE2 2023-10-12 2023-10-16
# 2  CASE3 2023-10-12 2023-10-18
# 3  CASEX 2023-10-12 2023-10-19

Cast the dates to datetime64[D] types then use busy_day count to get the diff

days = np.busday_count(df['DATE1'].values.astype("datetime64[D]"), 
                       df['DATE2'].values.astype("datetime64[D]"), 
                       holidays=holidays)

# array([1, 2, 3, 4])

Use boolean indexing to filter the rows

valid_rows = df[days <= 2]
invalid_rows = df[days > 2]

# valid_rows 
#     NAME      DATE1      DATE2
# 0  CASE1 2023-10-12 2023-10-13
# 1  CASE2 2023-10-12 2023-10-16

# invalid_rows 
#     NAME      DATE1      DATE2
# 2  CASE3 2023-10-12 2023-10-18
# 3  CASEX 2023-10-12 2023-10-19

Python Pandas filter rows by days of difference in two columns with weekend and holidays

There are 2 best solutions below

Code

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATE-DIFFERENCE

Related Questions in WEEKEND

Trending Questions

Popular # Hahtags

Popular Questions