How to create new data rows for each unique value in a column (potentially grouped by other variables) in R

Question

How to create new data rows for each unique value in a column (potentially grouped by other variables) in R

108 Views Asked by sheri At 17 August 2025 at 18:48

I have data that has the following structure:

data <- data.frame(
  uniqueid = c(1, 1, 2, 2, 3, 3),
  year = c(2010, 2011, 2010, 2011, 2010, 2011),
  agency = c("SZ", "SZ", "SZ", NA, "SZ", "HE"),
  switch = c(0, 0, 0, NA, 0, 1)
)

As you can see, the data is organized by uniqueids appearing in a given year. Keep in mind that for the agency column there can be 13 different unique strings appearing across different uniqueids. I would like the data to look like the following:

data <- data.frame(
  uniqueid = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3),
  year = c(2010, 2010, 2011, 2011, 2010, 2010, 2011, 2011, 2011, 2010, 2010, 2011, 2011),
  agency = c("SZ", "HE", "SZ", "HE", "SZ", "HE", "SZ", NA, "HE", "SZ", "HE", "SZ", "HE"),
  switch = c(0, 0, 0, 0, 0, 0, NA, NA, NA, 0, 0, 0, 1)
)

In this transformation, the rows take on the different unique values of the agency variable, and the switch variable largely maps what it had previously looked like. I'm not really sure how to accomplish this in R, though I would prefer the solution to be within tidyverse. Thank you!

I've been trying something like the following but I don't seem to be getting exactly what I want:

data1 <- data %>%
      pivot_wider(names_from = agency, values_from = lead, names_prefix = "agency_", values_fill = "0") %>%
      gather(key = agency, value = lead, starts_with("agency_")) %>%
      arrange(uniqueid, year, agency)

Original Q&A

There are 3 best solutions below

Grasshopper_NZ On 01 December 2023 at 06:06

You could break this question into 3 steps, give it a try:

0. Let's call your raw data as `raw_data`:

library(tidyverse)

raw_data <- data.frame(
  uniqueid = c(1, 1, 2, 2, 3, 3),
  year = c(2010, 2011, 2010, 2011, 2010, 2011),
  agency = c("SZ", "SZ", "SZ", NA, "SZ", "HE"),
  switch = c(0, 0, 0, NA, 0, 1)
)

1. Isolate the row(s) where the `agency` column is NA:

NA_rows <- raw_data |> filter(is.na(agency))

2. Process the filled rows:

filled_rows <- raw_data |> 
  
  complete(uniqueid, year, agency) |> 
  
  select(-switch) |> 
  
  filter(!is.na(agency)) |> 
  
  left_join(raw_data, join_by(uniqueid, year, agency)) |> 
  
  mutate(switch = case_when(
    is.na(switch) ~ 0,
    TRUE ~ switch
  ))

3. Bind the `filled_rows` together with the `NA_rows`:

bind_rows(filled_rows, NA_rows)

# A tibble: 13 × 4
   uniqueid  year agency switch
      <dbl> <dbl> <chr>   <dbl>
 1        1  2010 HE          0
 2        1  2010 SZ          0
 3        1  2011 HE          0
 4        1  2011 SZ          0
 5        2  2010 HE          0
 6        2  2010 SZ          0
 7        2  2011 HE          0
 8        2  2011 SZ          0
 9        3  2010 HE          0
10        3  2010 SZ          0
11        3  2011 HE          1
12        3  2011 SZ          0
13        2  2011 NA         NA

I am sure there are more elegant approaches, hope it is helpful.

Darren Tsai On 01 December 2023 at 07:19

Filter rows with NA out
Complete the data with missing combinations
Perform dplyr::union(not base::union) on the completed and original dataframes to include those rows with missing agency.
Re-arrange.

library(tidyverse)

data %>%
  drop_na() %>%
  complete(uniqueid, year, agency, fill = list(switch = 0)) %>%
  union(data) %>%
  mutate(switch = case_when(!anyNA(agency) ~ switch), .by = c(uniqueid, year)) %>%
  arrange(uniqueid, year)

# A tibble: 13 × 4
   uniqueid  year agency switch
      <dbl> <dbl> <chr>   <dbl>
 1        1  2010 HE          0
 2        1  2010 SZ          0
 3        1  2011 HE          0
 4        1  2011 SZ          0
 5        2  2010 HE          0
 6        2  2010 SZ          0
 7        2  2011 HE         NA
 8        2  2011 SZ         NA
 9        2  2011 NA         NA
10        3  2010 HE          0
11        3  2010 SZ          0
12        3  2011 HE          1
13        3  2011 SZ          0

**Mark** · Accepted Answer

Here's yet another way:

Steps:

complete the data (add a row for every variation of id, year, and just the "SZ" and "HE" agencies)
If none of the agency values in each uniqueid and year combination are NA, then change the new row switch values in that group to be 0. Otherwise, keep them NA

data |>
  complete(uniqueid, year, agency = c("SZ", "HE")) |>
  mutate(switch = ifelse(!any(is.na(agency)) & is.na(switch), 0, switch), .by = c(uniqueid, year))

Output:

# A tibble: 13 × 4
   uniqueid  year agency switch
      <dbl> <dbl> <chr>   <dbl>
 1        1  2010 HE          0
 2        1  2010 SZ          0
 3        1  2011 HE          0
 4        1  2011 SZ          0
 5        2  2010 HE          0
 6        2  2010 SZ          0
 7        2  2011 HE         NA
 8        2  2011 SZ         NA
 9        3  2010 HE          0
10        3  2010 SZ          0
11        3  2011 HE          1
12        3  2011 SZ          0
13        2  2011 NA         NA

How to create new data rows for each unique value in a column (potentially grouped by other variables) in R

There are 3 best solutions below

0. Let's call your raw data as `raw_data`:

1. Isolate the row(s) where the `agency` column is NA:

2. Process the filled rows:

3. Bind the `filled_rows` together with the `NA_rows`:

Related Questions in R

Related Questions in TIDYVERSE

Related Questions in DATA-MANIPULATION

Related Questions in DATA-TRANSFORM

Trending Questions

Popular # Hahtags

Popular Questions

How to create new data rows for each unique value in a column (potentially grouped by other variables) in R

There are 3 best solutions below

0. Let's call your raw data as raw_data:

1. Isolate the row(s) where the agency column is NA:

2. Process the filled rows:

3. Bind the filled_rows together with the NA_rows:

Related Questions in R

Related Questions in TIDYVERSE

Related Questions in DATA-MANIPULATION

Related Questions in DATA-TRANSFORM

Trending Questions

Popular # Hahtags

Popular Questions

0. Let's call your raw data as `raw_data`:

1. Isolate the row(s) where the `agency` column is NA:

3. Bind the `filled_rows` together with the `NA_rows`: