I am trying to clean a spreadsheet of user-inputted data that includes a "birth_date" column. The issue I am having is that the date formating ranges widely between users, including inputs without markers between the date, month, and year. I am having a hard time developing a formula that is intelligent enough to interpret such a wide range of inputs. Here is a sample:
1/6/46
7/28/99
11272000
11/28/78
Here is where I started:
df['birth_date']=pd.to_datetime(df.birth_date)
This does not seem to make it past the first example, as it looks for a two-month format. Can anyone help with this?
Your best bet is to check each input and give a consistent output. Assuming Month-Day-Year formats, you can use this function