Pandas read_csv is not parsing date and time

59 Views Asked by At

I have a csv file which looks like this

Study ID,CG_Arrival_Date/Time,Arrival_Date,Arrival_Time
2,1/1/2011 0:03,1/1/2011,0:03:00
3,1/1/2011 0:53,1/1/2011,0:53:00

I wanted to parse arrival date and arrival time, so I worte this on Google Colab.

df = pd.read_csv('/content/SM_AI_Data.csv', index_col=['Study ID'],parse_dates=[['Arrival Date','Arrival Time']])
df.head()

The columns get combined fine, but its data type is still object.

How can I change the code so that it works?

Please see the details above.

1

There are 1 best solutions below

4
On BEST ANSWER

I can't reproduce any problem if I specify the columns correctly. I use the indexes here because I'm too lazy to use the names :

import pandas as pd
from io import StringIO
csv_text="""
Study ID,Arrival_Date/Time,Arrival_Date,Arrival_Time 
2,1/1/2011 0:03,1/1/2011,0:03:00
3,1/1/2011 0:53,1/1/2011,0:53:00"""

df = pd.read_csv(StringIO(csv_text), index_col=['Study ID'],parse_dates=[1,2])
df.dtypes

This produces

Arrival_Date/Time    datetime64[ns]
Arrival_Date         datetime64[ns]
Arrival_Time                 object

The contents are:

    Arrival_Date/Time   Arrival_Date    Arrival_Time
Study ID            
2   2011-01-01 00:03:00 2011-01-01  0:03:00
3   2011-01-01 00:53:00 2011-01-01  0:53:00

The Date and Time columns can be combined too :

import pandas as pd
from io import StringIO
csv_text="""
Study ID,Arrival_Date/Time,Arrival_Date,Arrival_Time 
2,1/1/2011 0:03,1/1/2011,0:03:00
3,1/1/2011 0:53,1/1/2011,0:53:00"""

df = pd.read_csv(StringIO(csv_text), index_col=['Study ID'],parse_dates=[[2,3],1])
df.dtypes

The contents are

    Arrival_Date_Arrival_Time   Arrival_Date/Time
Study ID        
2   2011-01-01 00:03:00 2011-01-01 00:03:00
3   2011-01-01 00:53:00 2011-01-01 00:53:00

It's not possible to tell what the format is from the values. DD/MM/YYYY or MM/DD/YYYY? The correct format can be specified by setting the dayfirst parameter to True or False`