Pandas isin holidays.country_holidays incorrectly returns only False on 1st attempt but correct results on 2nd attempt

332 Views Asked by At

I'm struggling with the behavior of pandas .isin when checking if local dates are a local holiday.

I have a data.frame X with utc timestamps which i convert to local date and keep only one row per date in x_daily:

import pandas as pd
import holidays

X = pd.DataFrame({'timestampUtc': pd.date_range("2000-12-25", "2001-01-06", freq="1440min", tz="utc")})
X['local_date'] = X['timestampUtc'].dt.tz_convert(tz='Europe/Berlin').dt.date
x_daily = X[['local_date']].drop_duplicates()

No it gets weird: When i try to find the local holidays with .isinit doesn't find any. When i check each element of the local_datewith in, all holidays are found correctly. Calling .isin again after that also finds the correct holidays.

de_holidays = holidays.country_holidays(country='DE', state='BW')
# 1st try: no holidays found with isin
x_daily['local_date'].isin(de_holidays)
# correct holidays found with list comprehension and 'in'
[x_daily['local_date'].iloc[i] in de_holidays for i in range(x_daily.shape[0])]
# 2nd try: correct holidays found with isin
x_daily['local_date'].isin(de_holidays)

What's a reliable and efficient way, to assign a logical column to identify my local holidays?

I paste the whole code in one block again here:

import pandas as pd
import holidays

X = pd.DataFrame({'timestampUtc': pd.date_range("2000-12-25", "2001-01-06", freq="1440min", tz="utc")})
X['local_date'] = X['timestampUtc'].dt.tz_convert(tz='Europe/Berlin').dt.date
x_daily = X[['local_date']].drop_duplicates()

de_holidays = holidays.country_holidays(country='DE', state='BW')
# 1st try: no holidays found with isin
x_daily['local_date'].isin(de_holidays)
# correct holidays found with list comprehension and 'in'
[x_daily['local_date'].iloc[i] in de_holidays for i in range(x_daily.shape[0])]
# 2nd try: correct holidays found with isin
x_daily['local_date'].isin(de_holidays)

This is my console output: enter image description here

1

There are 1 best solutions below

0
MSpiller On BEST ANSWER

The documentation of the holidays module says:

To maximize speed, the list of holidays is built as needed on the fly, one calendar year at a time. When you instantiate the object, it is empty, but the moment a key is accessed it will build that entire year’s list of holidays. To prepopulate holidays, instantiate the class with the years argument:

us_holidays = holidays.US(years=2020)

I.e. you have to access the list first and it will start to populate it.

The implementation of isin will convert to argument to a list first, which will in your case result in an empty list.

You could change your code to

de_holidays = holidays.country_holidays(country='DE', state='BW', years=[2000, 2001])

and it should work as expected.