sample data
# Set seed for reproducibility
set.seed(123)
# Create a sample dataframe with 100 observations
n_obs <- 100
df <- data.frame(
serial_id = 1:n_obs,
code_1 = sample(c("yes", "no"), n_obs, replace = TRUE),
code_2 = sample(c("yes", "no"), n_obs, replace = TRUE),
code_3 = sample(c("yes", "no"), n_obs, replace = TRUE),
type_1 = sample(c("A", "B", "C", "D"), n_obs, replace = TRUE),
type_2 = sample(c("A", "B", "C", "D"), n_obs, replace = TRUE),
type_3 = sample(c("A", "B", "C", "D"), n_obs, replace = TRUE)
)
I am trying to create a variable that satisfies the following logic:
- For each row: if any of the code_* columns have "yes", AND the corresponding type_* column (For example: code_1 corresponds with type_1 and so on) have "A", the new variable takes in "1".
- For each row: if any of the code_* columns have "yes", AND the corresponding type_* column (For example: code_1 corresponds with type_1 and so on) have "B", the new variable takes in "0". This rule overrides all the previous rules, even if there there is a logic of a combination of "yes" and "A" that should have resulted in a "1"
- For each row: If there is "no" for corresponding code_* of both "B" and "C" in any of the type_* columns and a "yes" in a code_* which has a "C" in it's corresponding type_x, then new_var == 1
I could not figure out, how get the names of the corresponding column based on the last character (1,2,3,4,5,6.......), and then performing a rowwise operation taking into columns of that isolated row. The original data has such about 20 of such pair of code_* and type_*. So, I am trying to come up with something iterable.
Convert to long, cast to a wider format with the code and type paired, apply the rules by
serial_idwithany(), and then left join the new variable back to the original data set onserial_id. This should work for as many code/type pairs as you have.