Recoding into a new categorical variable, where a category gets converted to NA

86 Views Asked by At

I am currently trying to recode variables with an NA but I am having problems.

I am having trouble and my data looks like this for an income variable:

Income1
0
1
1
2
2
0 

I wanted to remove the 0s and recode them into NAs. The 0s represent respondents who marked down 'choose not to answer'. I have tried this:

> Comm %>%
+   mutate(Income2 = case_when(Income1 = 0 ~ NA_real_,
+                              Income1 = 1 ~ 'Less than 50K'
+                              Income1 = 2 ~ 'More than 50K'))

but I keep getting this error:

Error in `mutate()`:
ℹ In argument: `Income2 = case_when(...)`.
Caused by error in `case_when()`:
! `0` must be a vector with type <logical>.
Instead, it has type <double>.

I tried converting Income1 as a logical but for whatever reason it's not working. So I tried using the expss package (SPSS like package). I wanted to retain the 1s and 2s.

Comm$Income2 = recode(Comm$Income1, "No answer" = 0 ~ NA_real_, 
                          "Less Than 50K" = 1 ~ 1, 
                          "More Than 50K" = 2 ~ 2)

That did not work because:

Error in process_recodings(x, unlist(list(...), recursive = TRUE), make_empty_vec(x),  : 
  'recode' - labelled recodings should recode into single not-NA value but we have: 0 ~ NA

Thank you for reading, any advice would help!

3

There are 3 best solutions below

0
On BEST ANSWER

As for recode from expss. The error message said: " labelled recodings should recode into single not-NA value". So you need to remove the label from your recoding to NA. The reason for this is that label on NA (missing) value is not allowed both in SPSS and in expss. Code below works:

Comm$Income2 = recode(Comm$Income1, 0 ~ NA_real_, 
                      "Less Than 50K" = 1 ~ 1, 
                      "More Than 50K" = 2 ~ 2)
1
On

first thing, you forgot to add a comma after 'Less than 50K' in your case statement. Using NA_real_ also creates an issue because it is used to represent missing integer values in numeric columns, but the mutate here tries to create a character-type column so NA_real_ wouldn't be the appropriate data type here.

I would suggest you use NA_character_ instead. Also, you may want to change the "=" sign to "==" as only one "=" sign is used for variable assignment but "==" is used to check equality, as R is probably reading the = sign as a variable assignment.

This should work now:

Comm %>%
  mutate(Income2 = case_when(Income1 == 0 ~ NA_character_,
                             Income1 == 1 ~ 'Less than 50K',
                             Income1 == 2 ~ 'More than 50K'))
0
On

You can convert your variable into factor ignoring 0; it will get converted into missing. Or use subsetting combined with match.

library(dplyr)

# Method 1

Comm |> 
  mutate(Income2 = factor(Income1, levels=1:2, labels=c("Less Than 50K", "More Than 50K")))

# Method 2
Comm |> 
  mutate(Income2 = c("Less Than 50K", "More Than 50K")[match(Income1, 1:2)])

  Income1       Income2
1       0          <NA>
2       1 Less Than 50K
3       1 Less Than 50K
4       2 More Than 50K
5       2 More Than 50K
6       0          <NA>

data

Comm = structure(list(Income1 = c(0L, 1L, 1L, 2L, 2L, 0L)), 
                 row.names = c(NA, -6L), 
                 class = "data.frame")