r retain non zero rows in case of duplicates

88 Views Asked by At

I have a dataset with duplicate rows if we group by two columns

 ID    Group       Value    
   1     z1       0 
   1     z1       0.81 
   2     z2       2.89 
   2     z2       1.53
   3     z1       -0.23
   3     z1       0
   4     z3       10.75
   4     z3       8.13
   5     x2       0.45
   5     x2       1.43
df <- tibble(
  ID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),
  Group = c("z1", "z1", "z2", "z2", "z1", "z1", "z3", "z3", "x2", "x2"),
  Value = c(0, 0.81, 2.89, 1.53, -0.23, 0, 10.75, 8.13, 0.45, 1.43)
)

I am trying to retain only non-zero rows if there are duplicate rows when grouping by ID and Group columns

Expected output

 ID    Group       Value    
 
   1     z1       0.81 

   2     z2       2.89 
   2     z2       1.53

   3     z1       -0.23

   4     z3       10.75
   4     z3       8.13

   5     x2       0.45
   5     x2       1.43

The rows with 0s in ID 1 and 3 are excluded. This is what I tried so far

df %>%
group_by(ID, GroupID) %>%
filter(if (any(duplicated(.) Value != 0 else TRUE) %>%
ungroup()

I have 22000 rows in my original dataset, the code above took a long time and did not converge. I then tried this.

df %>%
group_by(ID, GroupID) %>%
slice(if (any(duplicated(.))) which(Value != 0) else 1:n()) %>%
ungroup()

This is still not solving the problem, not giving me the expected results, the code takes a long time , and nothing seems to happen.

Any suggestion on how to tweak this code is much appreciated. Thanks.

1

There are 1 best solutions below

1
On

Why not just:

library(dplyr)
df %>% 
  filter(Value != 0)
 ID Group Value
1  1    z1  0.81
2  2    z2  2.89
3  2    z2  1.53
4  3    z1 -0.23
5  4    z3 10.75
6  4    z3  8.13
7  5    x2  0.45
8  5    x2  1.43