Compact code for conditional replacement of values with an "OR" condition

69 Views Asked by At

I have a very long dataset and a relatively short list of ID values for which my data is wrong. The following works, but my wrong_IDs vector is actually much larger:

wrong_IDs <- c('A1', 'B3', 'B7', 'Z31')
df$var1[df$var2 == 'A1' | df$var2 == 'B3' | df$var2 == 'B7' | df$var2 == 'Z31'] <- 0L

This looks very basic but I haven't found a compact way of writing this. Thanks for any help

2

There are 2 best solutions below

0
On BEST ANSWER

You can compare your data to the wrong_IDs with the %in% operator

df <- data.frame("var1" = 101:120, "var2" = c(1:20))
wrong_ids <- c(3, 5, 7)
df$var1[df$var2 %in% wrong_ids] <- 0

where df$var2 %in% wrong_ids provides you a TRUE/FALSE boolean vector that applies only the "set to zero" operation on the selected rows (here row 3, 5 and 7).

6
On

Here's a very compact solution using grepl and regex:

Some illustrative data:

set.seed(123)
df <- data.frame(
  ID = paste0(rep(LETTERS[1:3], 2), sample(1:3, 6, replace = T)),
  Var2 = rnorm(6),
  stringsAsFactors = F)
df

wrong_IDs <- c('A1', 'B3', 'B1', 'C3')

To set to 0 those rows that contain the wrong_IDs you can collapse these values into a single string separated only by the regex alternation operator | and instruct grepl to match these alternative patterns in df$ID:

df$ID <- ifelse(grepl(paste0(wrong_IDs, collapse = "|"), df$ID), 0, df$ID)
df
  ID        Var2
1  0  0.07050839
2  0  0.12928774
3 C2  1.71506499
4 A3  0.46091621
5  0 -1.26506123
6 C1 -0.68685285