R Replace Keys of Multiple Columns with a Key Data frame

466 Views Asked by At

I have a dataset with 100 questions (below I have a subset with 3 questions). I want to replace all the answer IDs with the actual answers provided in the "answer" dataset. The final result is shown in the "result" data frame.

data

 name   q1   q2  q3
 1      a    1    3  7 
 2      a    8    3  1
 3      a    3    9  2
 4      b   4    4  3

answer

 id   str  
    1   TRUE   
    2   FALSE   
    3   YES   
    4   NO
    5   LESS   
    6   MORE   
    7   GREATER   
    8   LESS
    9   NONE   
   10  DAILY   

result

  name     q1   q2     q3
1    a    TRUE  YES   GREATER
2    a    LESS  YES   TRUE
3    a    YES   NONE  FALSE
4    b    NO    NO    YES
2

There are 2 best solutions below

7
On

We can match the elements of the dataset ('df1', without the 'name' column) with the 'id' from 'answer' to get the numeric index (in this case we don't need match. In general, it may be safer to use match) and get the corresponding 'str'.

 df1[-1] <- answer$str[match(as.matrix(df1[-1]), answer$id)]
 df1
 # name   q1   q2      q3
 #1    a TRUE  YES GREATER
 #2    a LESS  YES    TRUE
 #3    a  YES NONE   FALSE
 #4    b   NO   NO     YES

Or use lookup from qdapTools which can take key/value columns as a 'data.frame' (ie. 'answer') and get the matching values

 library(qdapTools)
 df1[-1] <- lookup(unlist(df1[-1]), answer)

Or

 df1[-1] <- with(answer, setNames(str, id))[as.character(unlist(df1[-1]))]
0
On

Or use indexing:

data[-1] <- sapply(data[-1], function(x) answer$str[x])
#   name   q1   q2      q3
# 1    a TRUE  YES GREATER
# 2    a LESS  YES    TRUE
# 3    a  YES NONE   FALSE
# 4    b   NO   NO     YES

Larger tasks can be broken down to simplified examples to test methods. Create a vector with q1 values only. v <- c(1,8,3,4) If we can replace these four, it is quite possible to scale the operation:

answer$str[v]
[1] TRUE LESS YES  NO 

This creates the first question column. The remainder of the code is repeating that process for each column.

Edit

A quicker way without sapply. It will work as long as the lookup list is in order and is non-repeating:

data[-1] <- answer$str[as.matrix(data[-1])]
#   name   q1   q2      q3
# 1    a TRUE  YES GREATER
# 2    a LESS  YES    TRUE
# 3    a  YES NONE   FALSE
# 4    b   NO   NO     YES