I have a series of data sets and a dictionary to bring these together. But I'm struggling to figure out how to automate this.
Suppose this data and dictionary (actual one is much longer, thus I want to automate):
mtcarsA <- mtcars[1:5,] %>% rename(mpgA = mpg, cyl_A = cyl) %>% as_tibble()
mtcarsB <- mtcars[6:10,] %>% rename(mpg_B = mpg, B_cyl = cyl) %>% as_tibble()
dic <- tibble(true_name = c("mpg_true", "cyl_true"),
nameA = c("mpgA", "cyl_A"),
nameB = c("mpg_B", "B_cyl")
)
I want these datasets (from years A and B) appended to one another, and then to have the names changed or coalesced to the 'true_name' values.
I can bring the data sets together into mtcars_all
, and then I tried recoding the column names with the dictionary as follows
mtcars_all <- bind_rows((mtcarsA, mtcarsB)
recode_colname <- function(df, tn=dic$true_name, fname){
colnames(df) <- dplyr::recode(colnames(df),
!!!setNames(as.character(tn), fname))
return(df)
}
mtcars_all <- mtcars_all %>%
recode_colname(fname=dic$nameA) %>%
recode_colname(fname=dic$nameB)
But then I get duplicate columns. Of course I could coalesce each of these duplicate columns by name, but there will be many of these in my real case, so I want to automate 'coalesce all columns with duplicate names'.
I'm giving the entire problem here because perhaps someone also has a better solution for 'using a data dictionary'.
You can create a named vector to replace column names.
And apply it on list of dataframes and combine them.