I'm struggling with the behavior of pandas .isin when checking if local dates are a local holiday.
I have a data.frame X with utc timestamps which i convert to local date and keep only one row per date in x_daily:
import pandas as pd
import holidays
X = pd.DataFrame({'timestampUtc': pd.date_range("2000-12-25", "2001-01-06", freq="1440min", tz="utc")})
X['local_date'] = X['timestampUtc'].dt.tz_convert(tz='Europe/Berlin').dt.date
x_daily = X[['local_date']].drop_duplicates()
No it gets weird: When i try to find the local holidays with .isinit doesn't find any. When i check each element of the local_datewith in, all holidays are found correctly. Calling .isin again after that also finds the correct holidays.
de_holidays = holidays.country_holidays(country='DE', state='BW')
# 1st try: no holidays found with isin
x_daily['local_date'].isin(de_holidays)
# correct holidays found with list comprehension and 'in'
[x_daily['local_date'].iloc[i] in de_holidays for i in range(x_daily.shape[0])]
# 2nd try: correct holidays found with isin
x_daily['local_date'].isin(de_holidays)
What's a reliable and efficient way, to assign a logical column to identify my local holidays?
I paste the whole code in one block again here:
import pandas as pd
import holidays
X = pd.DataFrame({'timestampUtc': pd.date_range("2000-12-25", "2001-01-06", freq="1440min", tz="utc")})
X['local_date'] = X['timestampUtc'].dt.tz_convert(tz='Europe/Berlin').dt.date
x_daily = X[['local_date']].drop_duplicates()
de_holidays = holidays.country_holidays(country='DE', state='BW')
# 1st try: no holidays found with isin
x_daily['local_date'].isin(de_holidays)
# correct holidays found with list comprehension and 'in'
[x_daily['local_date'].iloc[i] in de_holidays for i in range(x_daily.shape[0])]
# 2nd try: correct holidays found with isin
x_daily['local_date'].isin(de_holidays)

The documentation of the holidays module says:
I.e. you have to access the list first and it will start to populate it.
The implementation of
isinwill convert to argument to a list first, which will in your case result in an empty list.You could change your code to
de_holidays = holidays.country_holidays(country='DE', state='BW', years=[2000, 2001])and it should work as expected.