I'm using read_csv to read in data while trying to ensure variables are read in as the correct type. To do so, I've first set up a "data dictionary", containing the mapping between variable names and their type (in "short" format: e.g., "d" for double, "c" for character, etc.) I've got code that gets the columns I want and the data types I expect them to be, but when I pass those values to the col_select and col_types arguments, it doesn't always work.
d1=tibble(read_var_2="2024-02-29",read_var_1="4",ignore_var_1="null",read_var_3="text")
write_csv(d1,"data_d1.csv")
test_d1 = read_csv(
"data_d1.csv",
col_select = c("read_var_2", "read_var_1", "read_var_3"),
col_types = "Ddc"
)
test_d1 ## works as expected
# # A tibble: 1 × 3
# read_var_2 read_var_1 read_var_3
# <date> <dbl> <chr>
# 1 2024-02-29 4 text
d2=tibble(read_var_2="2024-02-29",read_var_1="4",ignore_var_1="null",read_var_3="text",read_var_4="37.89")
write_csv(d2,"data_d2.csv")
test_d2 = read_csv(
"data_d2.csv",
col_select = all_of(c("read_var_2", "read_var_1", "read_var_3", "read_var_4")),
col_types = "Ddcd"
)
## PROBLEM
test_d2 ## why is read_var_3 NA dbl, it should be character "text"?
# # A tibble: 1 × 4
# read_var_2 read_var_1 read_var_3 read_var_4
# <date> <dbl> <dbl> <dbl>
# 1 2024-02-29 4 NA 37.9