I have the following data-frame df
(fictitious data) with several variables var1
, var2
, ..., var_n
:
var1<-c("A","A","A","B","A","C","C","A", "A", "E", "E", "B")
var2<-c(NA,"1","1","5","6","2","3","1", "1", "3", "3", "2")
id<-c(1,2,2,3,3,4,4,5,5,6,6,7)
df<-data.frame(id, var1, var2)
df
id var1 var2
1 A <NA>
2 A 1
2 A 1
3 B 5
3 A 6
4 C 2
4 C 3
5 A 1
5 A 1
6 E 3
6 E 3
7 B 2
The data are retrieved from a document analysis where several coders extracted the values from physical files. Each file does have a specific id
. Thus, if there are two entries with the same id
this means that two different coders coded the same document. For example in document no. 4 both coders agreed that var1 has the value C, whereas in document no. 3 there is a dissent (A vs. B).
In order to calculate inter-rater-reliability (irr) I need to restructure the dataframe as follows:
id var1 var1_coder2 var2 var2_coder2
2 A A 1 5
3 B A 5 6
4 C C 2 3
5 C C 1 1
6 E E 3 3
Can anyone tell me how to get this done? Thanks!
You can transform your data with functions from
dplyr
(group_by
,mutate
) andtidyr
(gather
,spread
,unite
):If you only want to keep the rows where all coder have entered values you can use
filter_all
.