How to import excel with timestamp correctly into dataframe (pandas)

62 Views Asked by At

I have some hydrological data in an excel sheet that uses a "m/d/Y" format.

data_id date            flowrate 
34606   4/30/2021 23:45 -1904.39
34607   4/30/2021 23:50 -1864.59
34608   4/30/2021 23:55 -1772.78
34609   1/5/2021 0:00   -1679.54
34610   1/5/2021 0:05   -1646.46
34611   1/5/2021 0:10   -1527.03

When I import that data into a pandas data frame I see that the format is changed and the index is shifted.

import pandas as pd
DATA = pd.read_excel("DATA.xlsx")
DATA
                      date  flowrate_m3s
34557      4/30/2021 23:45      -1904.39
34558      4/30/2021 23:50      -1864.59
34559      4/30/2021 23:55      -1772.78
34560  2021-01-05 00:00:00      -1679.54
34561  2021-01-05 00:05:00      -1646.46
34562  2021-01-05 00:10:00      -1527.03

It seems that pd.read_excel() assumes the order "d/m/Y" when reading the date, unless m>12. I tried defining the format before reading the data, using

FORMAT = "%m/%d/%Y %H:%M:%S" #
DATA = pd.read_excel("DATA.xlsx",parse_dates=[2], date_format=FORMAT)

but this lead to the exact same outcome.

I also tried transforming DATA.date into datetime, but that is of course only reproducing the wrong dates.

I am out of ideas and hope someone can point me into the right direction.

Edit: More things that did not work out

  • changing the date format in the excel sheet.

Edit: the problem seems to be within the excel dataset. All entries for the first 12 days of the month are in m/d/Y, then the format switches to d/m/Y from the 13th day onwards.

Last edit:

The problem was in the Excel file itself. I am now using the original csv and using

DATA = pd.read_csv("DATA.csv")
FORMAT = "%m/%d/%Y %H:%M" #01/01/2021 00:00
DATA.insert(1, "date_num", pd.to_datetime(DATA.date, format = FORMAT), True)

This solves all my problems

1

There are 1 best solutions below

1
Abdul Moeez On

if you want to convert your date format of date filed you can do like this with any data.xlsx.

import pandas as pd
DATA = pd.read_excel("DATA.xlsx")

# Convert 'date' column to datetime using pd.to_datetime() with format
DATA['date'] = pd.to_datetime(DATA['date'], format='%m/%d/%Y %H:%M:%S')

print(DATA)