read_csv() parsing failures - using problems() doesn't reveal what the issue is

595 Views Asked by At

I am trying to read in a dataset that has a column with a date in it. As a default, this column is read in as character but I want it read in as date.

If I read in by using read_csv using the defaults, the dates in the column display like so in the viewer:12/04/2019 (i.e., dmy)

However, when using the following I get parsing failures:

data<- read_csv("file.csv", 
            col_types = cols(dob = col_date("%d-%m-%Y"))

Warning: 4160 parsing failures.
row col           expected     actual                                                                                                                                                                                       
  1 dob date like %d-%m-%Y 12/04/2019

At first, I thought this was because I had specified hyphens (-) in col_date. But I get the same errors if I change the hyphen to a forward slash:

data<- read_csv("file.csv", 
            col_types = cols(dob = col_date("%d/%m/%Y"))

Warning: 4160 parsing failures.
row col           expected     actual                                                                                                                                                                                       
  1 dob date like %d/%m/%Y 12-04-2019

Using problems() just expands on this message. I'm struggling to know how to proceed because whatever I change in col_date() doesn't seem help. In fact, it then reports (as can be seen above) that the opposite formatting was found in the file.

EDIT: Trying suggestion from Bernhard

I read in the column as character and ran the following:

head(data$dob, 20)

[1] "12/04/20" "20/04/2020" "20/04/2020" "20/04/2020" "20/04/2020" "20/04/2020" "20/04/2020" "20/04/2020" "20/04/2020" "20/04/2020" "20/04/2020"
[12] "12/04/2019" "12/04/2019" "12/04/2019" "12/04/2019" "12/04/2019" "12/04/2019" "12/04/2019" "12/04/2019" "12/04/2019" 
1

There are 1 best solutions below

0
On BEST ANSWER

After suggestions from Bernhard, I found that there were inconsistencies with the input date format with different rows using different separators. I fixed this by converting them all into the same separators.