I would like to mark the days in my timeseries (data from china) in an extra column as holiday(boolean true) and non holiday(boolean false).
I am new to this topic and at the moment I am trying to figure out the way how to approach this problem.
I have following days for 2020 as chinese official holidays:
As far as I know, there is no calendar out of the box for china, so I will have to creat a custom calandar as follow:
from pandas.tseries.holiday import Holiday,AbstractHolidayCalendar
class ChineseHolidays(AbstractHolidayCalendar):
rules = [Holiday('Chinese New Year', month=1, day=25),
'Question: How to add more than one day?',
etc,
...]
cal = ChineseHolidays()
The next steps would be to create the Holidays columns as follows:
holidays = cal.holidays(start=X['timestamp'].min(), end = X['timestamp'].max())
X.assign(Holidays=X['timestamp'].isin(cal.holidays()).astype(int))
My questions here are:
1) Is this in general a proper apporach?
2) How can I define in the line Holiday('Chinese New Year', month=1, day=25) that the days of start from 24th of january and end on 30th of January? Is there a way to define the days off instead of defining just one day?
Thanks for your help.
Best,
B.
Chinese people use lunar calendar. So you can use such lib in python:
pip instal LunarCalendar
returns canonical 2020-01-25