Here's an example of some data that I have
dput(df)
structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "b",
"c", "d", "e", "f", "j", "k", "n", "m", "q", "r"), Number = c(1,
2, 1, 3, 4, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,
1, 2, 1, 1, 1, 1, 1, 2, 2), Location = c(1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2), Surveyor = c("JKK", "JKK", "JKK", "JKK", "JKK", "JKK",
"JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK",
"JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK",
"JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK")), row.names = c(NA,
-31L), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_character",
"collector")), Number = structure(list(), class = c("collector_double",
"collector")), Location = structure(list(), class = c("collector_double",
"collector")), Surveyor = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x00000253510611f0>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
In the above example I have the col ID which contains the letters a-t (my real data have other letter codes), in Location 1 all of these codes are present. However in Location 2 some of them are missing.
I would like to add 0 into the Number col where there are missing IDs.
My real data has 25 locations so ideally I'd like to be able to grab the full list of IDs and check that against each Location and add 0 where they are missing.
I've tried tidyverse::mutate with case_when but I'm not getting anywhere. Any help appreciated.
You can use
tidyr::complete()for this. Thefillparameter allows you to set the values in the columns you create.For the new values, the output will have
0forNumberandNAfor any columns not explicitly set, i.e.Surveyor.Output
I assigned the above to a variable called
new_dfso you can easily compare with the originaldf: