R: read_csv ignores col_types

57 Views Asked by At

I'm using read_csv to read in data while trying to ensure variables are read in as the correct type. To do so, I've first set up a "data dictionary", containing the mapping between variable names and their type (in "short" format: e.g., "d" for double, "c" for character, etc.) I've got code that gets the columns I want and the data types I expect them to be, but when I pass those values to the col_select and col_types arguments, it doesn't always work.

d1=tibble(read_var_2="2024-02-29",read_var_1="4",ignore_var_1="null",read_var_3="text")
write_csv(d1,"data_d1.csv")

test_d1 = read_csv(
  "data_d1.csv",
  col_select = c("read_var_2", "read_var_1", "read_var_3"),
  col_types = "Ddc"
)
test_d1 ## works as expected
# # A tibble: 1 × 3
#   read_var_2 read_var_1 read_var_3
#   <date>          <dbl> <chr>     
# 1 2024-02-29          4 text      

d2=tibble(read_var_2="2024-02-29",read_var_1="4",ignore_var_1="null",read_var_3="text",read_var_4="37.89")
write_csv(d2,"data_d2.csv")

test_d2 = read_csv(
  "data_d2.csv", 
  col_select = all_of(c("read_var_2", "read_var_1", "read_var_3", "read_var_4")),
  col_types = "Ddcd"
)

## PROBLEM
test_d2 ## why is read_var_3 NA dbl, it should be character "text"?
# # A tibble: 1 × 4
#   read_var_2 read_var_1 read_var_3 read_var_4
#   <date>          <dbl>      <dbl>      <dbl>
# 1 2024-02-29          4         NA       37.9
0

There are 0 best solutions below