How to replace values in multiple columns according to conditions in R

135 Views Asked by At

I am having trouble writing a code that would replace all specified values in multiple columns with new values. The data frame has 20+ columns and I only want to change values in 8 columns (col1, col2, col3, etc). I want to replace all values (4, 5, 6, 7) with (0, -1, -2, -3) respectively. I have very limited knowledge in R and progamming and I have only been able to get a solution that would do the job for one column.

I have read so many solutions to similar questions on here but I could find a solution that works for me. So here is my code:

data$col1[raw_data$col1 == 4 ] <- 0
data$col1[raw_data$col1 == 5 ] <- -1
data$col1[raw_data$col1 == 6] <- -2
data$col1[raw_data$col1 == 7] <- -3

So this works well for one column. can I possibly do it one for all columns?

here is a snippet of how the columns and values are: dataframe

2

There are 2 best solutions below

4
On BEST ANSWER

Set up an example:

demodf <- data.frame(
  col1 = 1:10,
  col2 = 3:12,
  col3 = 5:14,
  col4 = 7:16
)

cols_to_amend <- c("col1", "col3")

replace just the relevant columns:

demodf[cols_to_amend] <- apply(demodf[cols_to_amend], 2, FUN = \(x) sapply(x, \(y) if (y %in% 4:7) 4-y else y))

gives:

   col1 col2 col3 col4
1     1    3   -1    7
2     2    4   -2    8
3     3    5   -3    9
4     0    6    8   10
5    -1    7    9   11
6    -2    8   10   12
7    -3    9   11   13
8     8   10   12   14
9     9   11   13   15
10   10   12   14   16

Explanation:

# we can use the list of column names to choose where we are replacing
demodf[cols_to_amend]
# we then use `apply` and `MARGIN = 2` to apply a function to each column in this data frame:
 <- apply(demodf[cols_to_amend], 2,
# The function we apply will be an anonymous function (`\( )`) taking as its input one column at a time:
FUN = \(x)
# and it will use `sapply` to go down that column performing the following on each item:
\(y) if (y %in% 4:7) 4-y else y)

dplyr version:

library(dplyr)

demodf |> 
  mutate(
    across(all_of(cols_to_amend),
           ~ ifelse(.x %in% 4:7, 4-.x, .x)
           )
    )

dplyr version 2

Excessive complexity for this toy example, but allowing for more complex replacements than simple math:

demodf |> 
  mutate(
    across(all_of(cols_to_amend),
           ~ case_when(.x == 4 ~ 0,
                       .x == 5 ~ -1,
                       .x == 6 ~ -2,
                       .x == 7 ~ -3,
                       .default = .x)
           )
    )
4
On

Just replace the corresponding column names you want to modify, and it should work.

library(dplyr)

df <- data.frame(
  ID = 1:5,
  col1 = c(4, 5, 6, 7, 8),
  col2 = c(4, 5, 6, 7, 8),
  col3 = c(4, 5, 6, 7, 8),
  col4 = c(4, 5, 6, 7, 8),
  col5 = c(4, 5, 6, 7, 8),
  col6 = c(4, 5, 6, 7, 8),
  col7 = c(4, 5, 6, 7, 8),
  col8 = c(4, 5, 6, 7, 8)
)

# Specify the columns you want to modify
columns_to_modify <- c("col5", "col6", "col7")

# Specify a replacement function for the corresponding values

my_fun_replace <- \(vec){
  case_match(
    vec,
    4 ~ 0,
    5 ~ -1,
    6 ~ -2,
    7 ~ -3,
    .default = vec
  )
  
}


# Use across to replace values in specified columns
df <- df %>%
  mutate(across(all_of(columns_to_modify), my_fun_replace))