Is there a way in Python to mark the Chinese Holidays in Pandas Time Series

2.3k Views Asked by At

I would like to mark the days in my timeseries (data from china) in an extra column as holiday(boolean true) and non holiday(boolean false).

I am new to this topic and at the moment I am trying to figure out the way how to approach this problem.

I have following days for 2020 as chinese official holidays:

Chinese Holidays 2020

As far as I know, there is no calendar out of the box for china, so I will have to creat a custom calandar as follow:

from pandas.tseries.holiday import Holiday,AbstractHolidayCalendar
    class ChineseHolidays(AbstractHolidayCalendar):
    rules = [Holiday('Chinese New Year', month=1, day=25),
             'Question: How to add more than one day?',
             etc,
            ...]

    cal = ChineseHolidays()

The next steps would be to create the Holidays columns as follows:

holidays = cal.holidays(start=X['timestamp'].min(), end = X['timestamp'].max())

X.assign(Holidays=X['timestamp'].isin(cal.holidays()).astype(int))

My questions here are:

1) Is this in general a proper apporach?

2) How can I define in the line Holiday('Chinese New Year', month=1, day=25) that the days of start from 24th of january and end on 30th of January? Is there a way to define the days off instead of defining just one day?

Thanks for your help.

Best,

B.

2

There are 2 best solutions below

0
MaxxxZ On

Chinese people use lunar calendar. So you can use such lib in python:

pip instal LunarCalendar

import datetime
from lunarcalendar import Converter, Solar, Lunar, DateNotExist

l = Lunar(year=2020, month=1, day=1, isleap=False)
print(Converter.Lunar2Solar(l))

returns canonical 2020-01-25

0
Petriborg On

Looks to me like Pandas has a number of different date methods that support periods and repeating dates.

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

They also mention using this for holidays, so I suspect this might be what you're looking for.

Example

In [86]: pd.date_range('2018-01-01', '2018-01-05', periods=5)
Out[86]: 
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05'],
              dtype='datetime64[ns]', freq=None)