How to add Zeros where observations are missing

Question

How to add Zeros where observations are missing

35 Views Asked by McMahok At 19 March 2024 at 12:38

Here's an example of some data that I have

 dput(df)
structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "b", 
"c", "d", "e", "f", "j", "k", "n", "m", "q", "r"), Number = c(1, 
2, 1, 3, 4, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 
1, 2, 1, 1, 1, 1, 1, 2, 2), Location = c(1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2), Surveyor = c("JKK", "JKK", "JKK", "JKK", "JKK", "JKK", 
"JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", 
"JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", 
"JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK")), row.names = c(NA, 
-31L), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_character", 
"collector")), Number = structure(list(), class = c("collector_double", 
"collector")), Location = structure(list(), class = c("collector_double", 
"collector")), Surveyor = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x00000253510611f0>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

In the above example I have the col ID which contains the letters a-t (my real data have other letter codes), in Location 1 all of these codes are present. However in Location 2 some of them are missing.

I would like to add 0 into the Number col where there are missing IDs.

My real data has 25 locations so ideally I'd like to be able to grab the full list of IDs and check that against each Location and add 0 where they are missing.

I've tried tidyverse::mutate with case_when but I'm not getting anywhere. Any help appreciated.

Original Q&A

There are 1 best solutions below

**SamR** · Accepted Answer · 2024-03-19T12:44:16.227000

You can use tidyr::complete() for this. The fill parameter allows you to set the values in the columns you create.

df  |> 
    tidyr::complete(
        ID,
        Location, 
        fill = list(Number = 0)
    )

For the new values, the output will have 0 for Number and NA for any columns not explicitly set, i.e. Surveyor.

Output

I assigned the above to a variable called new_df so you can easily compare with the original df:

dplyr::anti_join(new_df, df, by = c("ID", "Location"))

# A tibble: 9 × 4
  ID    Location Number Surveyor
  <chr>    <dbl>  <dbl> <chr>   
1 a            2      0 NA      
2 g            2      0 NA      
3 h            2      0 NA      
4 i            2      0 NA      
5 l            2      0 NA      
6 o            2      0 NA      
7 p            2      0 NA      
8 s            2      0 NA      
9 t            2      0 NA

How to add Zeros where observations are missing

There are 1 best solutions below

Output

Related Questions in R

Related Questions in MISSING-DATA

Trending Questions

Popular # Hahtags

Popular Questions