How to change entry values for only certain variables to NA

61 Views Asked by At

Suppose I have 8 variables named MED1 to MED8 in a dataset called 2007d. I want to create a new variable called Totalmeds which will be the sum of all 8 variables in case an entry is present in a certain row. The variables MED1 to MED8 also contains -9 as an entry which is to be treated as null and not counted towards the total for the 8 variables.

For example, if MED 1= 2368, MED 2= 2344, MED3=-9, MED4=2145, MED5=-9, MED6=-9, MED7=8765, and MED8=2345 for a certain row, then I want the total column to show 5 as the output, i.e, not considering -9 as an entry for the total.

I tried individually replacing each column with NA

library{dplyr}
`2007d` <- 
  `2007d` %>%
   replace_with_na(replace = list(MED1 = c(-9.00)))

but since I have multiple columns was wondering if there is a more efficient way.

I also tried using the mutate_at function:

`2007d` <- 
  `2007d` %>%
  mutate_at(vars(starts_with("MED")), na_if, y = -9)

but it is not working.

The initial rows of the 8 columns in the dataset looks like:

enter image description here

I wanted to know if there is an efficient method to change just those variables in the whole dataset so that I do not have to individually make modifications to each variable and then calculate total

1

There are 1 best solutions below

6
Friede On

For example- if MED 1= 2368, MED 2= 2344, MED3=-9, MED4=2145, MED5=-9, MED6=-9, MED7=8765 and MED8=2345 for a certain row, then I want the total column to show 5 as the output

Probably, you are looking for rowSums with condition > 0:

> df = structure(list(MED1 = c(2368, 2368), MED2 = c(2344, -4), 
+                     MED3 = c(-9, -9), MED4 = c(2145, 2145), MED5 = c(-9, -9), 
+                     MED6 = c(-9, -9), MED7 = c(8765, 8765), MED8 = c(2345, 2345)), 
+                row.names = c(NA, -2L), class = "data.frame")
> 
> df$total = rowSums(df > 0) # positives 
> df
  MED1 MED2 MED3 MED4 MED5 MED6 MED7 MED8 total
1 2368 2344   -9 2145   -9   -9 8765 2345     5
2 2368   -4   -9 2145   -9   -9 8765 2345     4

If it's about the sum of positive values per row

> df$pos_rowsum = 
+   apply(df, 1L, \(x) sum(x[x > 0]))
> df
  MED1 MED2 MED3 MED4 MED5 MED6 MED7 MED8 pos_rowsum
1 2368 2344   -9 2145   -9   -9 8765 2345      17967
2 2368   -4   -9 2145   -9   -9 8765 2345      15623

If you want to replace all negative values with 0 simply do

> df[df < 0] = 0
> df
  MED1 MED2 MED3 MED4 MED5 MED6 MED7 MED8
1 2368 2344    0 2145    0    0 8765 2345
2 2368    0    0 2145    0    0 8765 2345

If it's just about -9, do df[df == -9] = 0 instead. This operation is not necessary to perform the above mentioned computations as those ignore negative values.

If there are more columns and we only want to select those containing MED followed by one or two digits, we can do:

df$total = rowSums(df[grep("^MED\\d{1,2}$", colnames(df))] > 0)
df$pos_rowsum = apply(df[grep("^MED\\d{1,2}$", colnames(df))], 1L, \(x) sum(x[x > 0]))

and

df[grep("^MED\\d{1,2}$", colnames(df))][df[grep("^MED\\d{1,2}$", colnames(df))] < 0] = 0