Cleaning inconsistent date formatting in pandas dataframe

3.2k Views Asked by At

I have a very large dataframe in which one of the columns, ['date'], datetime (dtype is string still) is formatted as below.. sometimes it is displayed as hh:mm:ss and sometimes as h:mm:ss (with hours 9 and earlier)

Tue Mar 1 9:23:58 2016
Tue Mar 1 9:29:04 2016 
Tue Mar 1 9:42:22 2016
Tue Mar 1 09:43:50 2016

pd.to_datetime() won't work when I'm trying to convert the string into datetime format so I was hoping to find some help in getting 0's in front of the time where missing.

Any help is greatly appreciated!

3

There are 3 best solutions below

1
On
import pandas as pd
date_stngs = ('Tue Mar 1 9:23:58 2016','Tue Mar 1 9:29:04 2016','Tue Mar 1 9:42:22 2016','Tue Mar 1 09:43:50 2016')
a = pd.Series([pd.to_datetime(date) for date in date_stngs])
print a

output

0   2016-03-01 09:23:58
1   2016-03-01 09:29:04
2   2016-03-01 09:42:22
3   2016-03-01 09:43:50
0
On
time = df[0].str.split(' ').str.get(3).str.split('').str.get(0).str.strip().str[:8]
year = df[0].str.split('--').str.get(0).str[-5:].str.strip()
daynmonth = df[0].str[:10].str.strip()

df_1['date'] = daynmonth + ' ' +year + ' ' + time

df_1['date'] = pd.to_datetime(df_1['date'])

Found this to work myself when rearranging the order

0
On

Assuming you have a one column DataFrame with strings as above and column name is 0 then the following will split the strings by space and then take the third string and zero-fill it with zfill

Assuming starting df

                         0
0   Tue Mar 1 9:23:58 2016
1   Tue Mar 1 9:29:04 2016
2   Tue Mar 1 9:42:22 2016
3  Tue Mar 1 09:43:50 2016

df1 = df[0].str.split(expand=True)
df1[3] = df1[3].str.zfill(8)
pd.to_datetime(df1.apply(lambda x: ' '.join(x.tolist()), axis=1))

Output

0   2016-03-01 09:23:58
1   2016-03-01 09:29:04
2   2016-03-01 09:42:22
3   2016-03-01 09:43:50
dtype: datetime64[ns]