UTC offset not correct using tz_localize and ZoneInfo

73 Views Asked by At

I have a pd.DatetimeIndex object in local time without UTC offsets. I am now trying to localize this object using tz_localize and a ZoneInfo object so that it contains UTC offset information, i.e. turn '2023-10-29 02:00:00' into '2023-10-29 02:00:00+02:00' for a specific timezone (e.g. 'Europe/Amsterdam').

For replication purposes, the below code builds the object that I am working with.

pd_local_no_tz = pd.date_range('2023-10-29', '2023-10-30', freq='H', tz=ZoneInfo('Europe/Amsterdam')).tz_localize(None)

Out[17]: 
DatetimeIndex(['2023-10-29 00:00:00', '2023-10-29 01:00:00',
               '2023-10-29 02:00:00', '2023-10-29 02:00:00',
               '2023-10-29 03:00:00', '2023-10-29 04:00:00',
               '2023-10-29 05:00:00', '2023-10-29 06:00:00',
               '2023-10-29 07:00:00', '2023-10-29 08:00:00',
               '2023-10-29 09:00:00', '2023-10-29 10:00:00',
               '2023-10-29 11:00:00', '2023-10-29 12:00:00',
               '2023-10-29 13:00:00', '2023-10-29 14:00:00',
               '2023-10-29 15:00:00', '2023-10-29 16:00:00',
               '2023-10-29 17:00:00', '2023-10-29 18:00:00',
               '2023-10-29 19:00:00', '2023-10-29 20:00:00',
               '2023-10-29 21:00:00', '2023-10-29 22:00:00',
               '2023-10-29 23:00:00', '2023-10-30 00:00:00'],
              dtype='datetime64[ns]', freq=None)

The original pd.DatetimeIndex object already contains the correct clock changes with '2023-10-29 02:00:00' appearing twice (without UTC offset).

I am now trying to localize the DatetimeIndex the following way using ZoneInfo:

pd_local = pd_local_no_tz.tz_localize(ZoneInfo('Europe/Amsterdam'), ambiguous='infer')

However, the resulting pd.DatetimeIndex now contains twice the entry '2023-10-29 02:00:00+02:00' although the second one should have a different UTC offset, namely '2023-10-29 02:00:00+01:00':

Out[21]: 
DatetimeIndex(['2023-10-29 00:00:00+02:00', '2023-10-29 01:00:00+02:00',
               '2023-10-29 02:00:00+02:00', '2023-10-29 02:00:00+02:00',
               '2023-10-29 03:00:00+01:00', '2023-10-29 04:00:00+01:00',
               '2023-10-29 05:00:00+01:00', '2023-10-29 06:00:00+01:00',
               '2023-10-29 07:00:00+01:00', '2023-10-29 08:00:00+01:00',
               '2023-10-29 09:00:00+01:00', '2023-10-29 10:00:00+01:00',
               '2023-10-29 11:00:00+01:00', '2023-10-29 12:00:00+01:00',
               '2023-10-29 13:00:00+01:00', '2023-10-29 14:00:00+01:00',
               '2023-10-29 15:00:00+01:00', '2023-10-29 16:00:00+01:00',
               '2023-10-29 17:00:00+01:00', '2023-10-29 18:00:00+01:00',
               '2023-10-29 19:00:00+01:00', '2023-10-29 20:00:00+01:00',
               '2023-10-29 21:00:00+01:00', '2023-10-29 22:00:00+01:00',
               '2023-10-29 23:00:00+01:00', '2023-10-30 00:00:00+01:00'],
              dtype='datetime64[ns, Europe/Amsterdam]', freq=None)

When doing the same thing without using ZoneInfo, the UTC offset is correct:

pd_local = pd_local_no_tz.tz_localize('Europe/Amsterdam', ambiguous='infer')

Out[23]: 
DatetimeIndex(['2023-10-29 00:00:00+02:00', '2023-10-29 01:00:00+02:00',
               '2023-10-29 02:00:00+02:00', '2023-10-29 02:00:00+01:00',
               '2023-10-29 03:00:00+01:00', '2023-10-29 04:00:00+01:00',
               '2023-10-29 05:00:00+01:00', '2023-10-29 06:00:00+01:00',
               '2023-10-29 07:00:00+01:00', '2023-10-29 08:00:00+01:00',
               '2023-10-29 09:00:00+01:00', '2023-10-29 10:00:00+01:00',
               '2023-10-29 11:00:00+01:00', '2023-10-29 12:00:00+01:00',
               '2023-10-29 13:00:00+01:00', '2023-10-29 14:00:00+01:00',
               '2023-10-29 15:00:00+01:00', '2023-10-29 16:00:00+01:00',
               '2023-10-29 17:00:00+01:00', '2023-10-29 18:00:00+01:00',
               '2023-10-29 19:00:00+01:00', '2023-10-29 20:00:00+01:00',
               '2023-10-29 21:00:00+01:00', '2023-10-29 22:00:00+01:00',
               '2023-10-29 23:00:00+01:00', '2023-10-30 00:00:00+01:00'],
              dtype='datetime64[ns, Europe/Amsterdam]', freq=None)

How can I get the correct UTC offset when transforming the DatetimeIndex object using a ZoneInfo object?

I use Python 3.9 and pandas 1.5.3 and its important for me to use ZoneInfo because pytz does not handle clock changes past 2037. So the goal is to properly add UTC offset information to timestamps past 2037.

1

There are 1 best solutions below

0
Markus On

I seems that the issue has been fixed with pandas 2.0. It now returns the correct timezone offset.

from zoneinfo import ZoneInfo
pd_local_no_tz = pd.date_range('2023-10-29', '2023-10-30', freq='H', tz=ZoneInfo('Europe/Amsterdam')).tz_localize(None)
pd_local = pd_local_no_tz.tz_localize(ZoneInfo('Europe/Amsterdam'), ambiguous='infer')
pd_local
Out[6]: 
DatetimeIndex(['2023-10-29 00:00:00+02:00', '2023-10-29 01:00:00+02:00',
               '2023-10-29 02:00:00+02:00', '2023-10-29 02:00:00+01:00',
               '2023-10-29 03:00:00+01:00', '2023-10-29 04:00:00+01:00',
               '2023-10-29 05:00:00+01:00', '2023-10-29 06:00:00+01:00',
               '2023-10-29 07:00:00+01:00', '2023-10-29 08:00:00+01:00',
               '2023-10-29 09:00:00+01:00', '2023-10-29 10:00:00+01:00',
               '2023-10-29 11:00:00+01:00', '2023-10-29 12:00:00+01:00',
               '2023-10-29 13:00:00+01:00', '2023-10-29 14:00:00+01:00',
               '2023-10-29 15:00:00+01:00', '2023-10-29 16:00:00+01:00',
               '2023-10-29 17:00:00+01:00', '2023-10-29 18:00:00+01:00',
               '2023-10-29 19:00:00+01:00', '2023-10-29 20:00:00+01:00',
               '2023-10-29 21:00:00+01:00', '2023-10-29 22:00:00+01:00',
               '2023-10-29 23:00:00+01:00', '2023-10-30 00:00:00+01:00'],
              dtype='datetime64[ns, Europe/Amsterdam]', freq=None)