I have data that has the following structure:

data <- data.frame(
  uniqueid = c(1, 1, 2, 2, 3, 3),
  year = c(2010, 2011, 2010, 2011, 2010, 2011),
  agency = c("SZ", "SZ", "SZ", NA, "SZ", "HE"),
  switch = c(0, 0, 0, NA, 0, 1)
)

As you can see, the data is organized by uniqueids appearing in a given year. Keep in mind that for the agency column there can be 13 different unique strings appearing across different uniqueids. I would like the data to look like the following:

data <- data.frame(
  uniqueid = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3),
  year = c(2010, 2010, 2011, 2011, 2010, 2010, 2011, 2011, 2011, 2010, 2010, 2011, 2011),
  agency = c("SZ", "HE", "SZ", "HE", "SZ", "HE", "SZ", NA, "HE", "SZ", "HE", "SZ", "HE"),
  switch = c(0, 0, 0, 0, 0, 0, NA, NA, NA, 0, 0, 0, 1)
)

In this transformation, the rows take on the different unique values of the agency variable, and the switch variable largely maps what it had previously looked like. I'm not really sure how to accomplish this in R, though I would prefer the solution to be within tidyverse. Thank you!

I've been trying something like the following but I don't seem to be getting exactly what I want:

data1 <- data %>%
      pivot_wider(names_from = agency, values_from = lead, names_prefix = "agency_", values_fill = "0") %>%
      gather(key = agency, value = lead, starts_with("agency_")) %>%
      arrange(uniqueid, year, agency)
3

There are 3 best solutions below

1
On BEST ANSWER

Here's yet another way:

Steps:

  1. complete the data (add a row for every variation of id, year, and just the "SZ" and "HE" agencies)
  2. If none of the agency values in each uniqueid and year combination are NA, then change the new row switch values in that group to be 0. Otherwise, keep them NA
data |>
  complete(uniqueid, year, agency = c("SZ", "HE")) |>
  mutate(switch = ifelse(!any(is.na(agency)) & is.na(switch), 0, switch), .by = c(uniqueid, year))

Output:

# A tibble: 13 × 4
   uniqueid  year agency switch
      <dbl> <dbl> <chr>   <dbl>
 1        1  2010 HE          0
 2        1  2010 SZ          0
 3        1  2011 HE          0
 4        1  2011 SZ          0
 5        2  2010 HE          0
 6        2  2010 SZ          0
 7        2  2011 HE         NA
 8        2  2011 SZ         NA
 9        3  2010 HE          0
10        3  2010 SZ          0
11        3  2011 HE          1
12        3  2011 SZ          0
13        2  2011 NA         NA
0
On

You could break this question into 3 steps, give it a try:

0. Let's call your raw data as raw_data:

library(tidyverse)

raw_data <- data.frame(
  uniqueid = c(1, 1, 2, 2, 3, 3),
  year = c(2010, 2011, 2010, 2011, 2010, 2011),
  agency = c("SZ", "SZ", "SZ", NA, "SZ", "HE"),
  switch = c(0, 0, 0, NA, 0, 1)
)

1. Isolate the row(s) where the agency column is NA:

NA_rows <- raw_data |> filter(is.na(agency))

2. Process the filled rows:

filled_rows <- raw_data |> 
  
  complete(uniqueid, year, agency) |> 
  
  select(-switch) |> 
  
  filter(!is.na(agency)) |> 
  
  left_join(raw_data, join_by(uniqueid, year, agency)) |> 
  
  mutate(switch = case_when(
    is.na(switch) ~ 0,
    TRUE ~ switch
  ))

3. Bind the filled_rows together with the NA_rows:

bind_rows(filled_rows, NA_rows)

# A tibble: 13 × 4
   uniqueid  year agency switch
      <dbl> <dbl> <chr>   <dbl>
 1        1  2010 HE          0
 2        1  2010 SZ          0
 3        1  2011 HE          0
 4        1  2011 SZ          0
 5        2  2010 HE          0
 6        2  2010 SZ          0
 7        2  2011 HE          0
 8        2  2011 SZ          0
 9        3  2010 HE          0
10        3  2010 SZ          0
11        3  2011 HE          1
12        3  2011 SZ          0
13        2  2011 NA         NA

I am sure there are more elegant approaches, hope it is helpful.

2
On
  1. Filter rows with NA out
  2. Complete the data with missing combinations
  3. Perform dplyr::union(not base::union) on the completed and original dataframes to include those rows with missing agency.
  4. Re-arrange.
library(tidyverse)

data %>%
  drop_na() %>%
  complete(uniqueid, year, agency, fill = list(switch = 0)) %>%
  union(data) %>%
  mutate(switch = case_when(!anyNA(agency) ~ switch), .by = c(uniqueid, year)) %>%
  arrange(uniqueid, year)

# A tibble: 13 × 4
   uniqueid  year agency switch
      <dbl> <dbl> <chr>   <dbl>
 1        1  2010 HE          0
 2        1  2010 SZ          0
 3        1  2011 HE          0
 4        1  2011 SZ          0
 5        2  2010 HE          0
 6        2  2010 SZ          0
 7        2  2011 HE         NA
 8        2  2011 SZ         NA
 9        2  2011 NA         NA
10        3  2010 HE          0
11        3  2010 SZ          0
12        3  2011 HE          1
13        3  2011 SZ          0