Pandas to_datetime() with mixed input string format

42 Views Asked by At

Among the data frames I am analysing are columns with date information. I am interested in the time information (HH:MM) and I could get that initially with df['columnname'].dt.time. Now we switched from the string format "YYYY-MM-DD HH:MM" to ISO8601, so an examplary date string would be "2022-03-04T23:00:00+01:00". Analysing the new data resulted in

Can only use .dt accessor with datetimelike values

I could reproduce this when I used a mix of both input string formats. Here is an example:

import pandas as pd

#d = {'Timestamp': ['2022-03-05T23:00:00+01:00', '2022-03-04T20:00:00+01:00', '2022-03-04T23:00:00+01:00']} #works
#d = {'Timestamp': ['2024-09-29 10:00', '2024-04-19 21:00', '2024-04-13 13:00']}                            #works, too 
d = {'Timestamp': ['2022-03-05T23:00:00+01:00', '2022-03-04T20:00:00+01:00', '2024-04-13 13:00']}           #does not work

df = pd.DataFrame(data=d)
df_times = pd.to_datetime(df['Timestamp'],  errors = 'coerce')

print(df_times.dt.time)

Each d with homogeneous format works but mixed format results in the error above. Now, I tried to use

df_times = pd.to_datetime(df['Timestamp'], **format='mixed'**, errors = 'coerce')

In this case, there is no error but every output value is NaT. Also in the case of homogeneous data strings.

I am confused by this behaviour because to me it seems like something in the regular scope of to_datetime. Can anyone please help me understand this. I would like to get a solution with to_datetime working (instead of some string manipulation) because it would enable to access other date attributes later on as well.

Thanks in advance!

0

There are 0 best solutions below