I have some hydrological data in an excel sheet that uses a "m/d/Y" format.
data_id date flowrate
34606 4/30/2021 23:45 -1904.39
34607 4/30/2021 23:50 -1864.59
34608 4/30/2021 23:55 -1772.78
34609 1/5/2021 0:00 -1679.54
34610 1/5/2021 0:05 -1646.46
34611 1/5/2021 0:10 -1527.03
When I import that data into a pandas data frame I see that the format is changed and the index is shifted.
import pandas as pd
DATA = pd.read_excel("DATA.xlsx")
DATA
date flowrate_m3s
34557 4/30/2021 23:45 -1904.39
34558 4/30/2021 23:50 -1864.59
34559 4/30/2021 23:55 -1772.78
34560 2021-01-05 00:00:00 -1679.54
34561 2021-01-05 00:05:00 -1646.46
34562 2021-01-05 00:10:00 -1527.03
It seems that pd.read_excel() assumes the order "d/m/Y" when reading the date, unless m>12. I tried defining the format before reading the data, using
FORMAT = "%m/%d/%Y %H:%M:%S" #
DATA = pd.read_excel("DATA.xlsx",parse_dates=[2], date_format=FORMAT)
but this lead to the exact same outcome.
I also tried transforming DATA.date into datetime, but that is of course only reproducing the wrong dates.
I am out of ideas and hope someone can point me into the right direction.
Edit: More things that did not work out
- changing the date format in the excel sheet.
Edit: the problem seems to be within the excel dataset. All entries for the first 12 days of the month are in m/d/Y, then the format switches to d/m/Y from the 13th day onwards.
Last edit:
The problem was in the Excel file itself. I am now using the original csv and using
DATA = pd.read_csv("DATA.csv")
FORMAT = "%m/%d/%Y %H:%M" #01/01/2021 00:00
DATA.insert(1, "date_num", pd.to_datetime(DATA.date, format = FORMAT), True)
This solves all my problems
if you want to convert your date format of date filed you can do like this with any data.xlsx.