I have a dataset with duplicate rows if we group by two columns
ID Group Value
1 z1 0
1 z1 0.81
2 z2 2.89
2 z2 1.53
3 z1 -0.23
3 z1 0
4 z3 10.75
4 z3 8.13
5 x2 0.45
5 x2 1.43
df <- tibble(
ID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),
Group = c("z1", "z1", "z2", "z2", "z1", "z1", "z3", "z3", "x2", "x2"),
Value = c(0, 0.81, 2.89, 1.53, -0.23, 0, 10.75, 8.13, 0.45, 1.43)
)
I am trying to retain only non-zero rows if there are duplicate rows when grouping by ID
and Group
columns
Expected output
ID Group Value
1 z1 0.81
2 z2 2.89
2 z2 1.53
3 z1 -0.23
4 z3 10.75
4 z3 8.13
5 x2 0.45
5 x2 1.43
The rows with 0s in ID 1 and 3 are excluded. This is what I tried so far
df %>%
group_by(ID, GroupID) %>%
filter(if (any(duplicated(.) Value != 0 else TRUE) %>%
ungroup()
I have 22000 rows in my original dataset, the code above took a long time and did not converge. I then tried this.
df %>%
group_by(ID, GroupID) %>%
slice(if (any(duplicated(.))) which(Value != 0) else 1:n()) %>%
ungroup()
This is still not solving the problem, not giving me the expected results, the code takes a long time , and nothing seems to happen.
Any suggestion on how to tweak this code is much appreciated. Thanks.
Why not just: