My data frame consists of several hundred persons (rows) answering 4 * 90 items each. It’s the same 90 questions across four points in time per person. Each person belongs to one of two groups. I checked for measurement invariance across these groups, this is easy with multigroup SEM and data in the wide format. Here is a shortened, reproducible example of my initial data frame:
df <- data.frame (ID = c(1,2,3,4,5),
Item1_time1 = c(4,4,5,4,3),
Item2_time1 = c(3,4,3,5,3),
Item1_time2 = c(5,4,4,5,4),
Item2_time2 = c(3,3,4,3,5),
group = c(0,1,0,1,0)
)
print(df)
ID Item1_time1 Item2_time1 Item1_time2 Item2_time2 group
1 1 4 3 5 3 0
2 2 4 4 4 3 1
3 3 5 3 4 4 0
4 4 4 5 5 3 1
5 5 3 3 4 5 0
However, now I want to check for measurement invariance across points in time. To do this, this is what the desired output should look like:
ID time Item1 Item2 group
1 1 4 3 0
1 2 5 3 0
2 1 4 4 1
2 2 4 3 1
3 1 5 3 0
3 2 4 4 0
4 1 4 5 1
4 2 5 3 1
5 1 3 3 0
5 2 4 5 0
In order to get this, I tried to convert my initial wide to long data, but I don’t really get the desired output:
library(data.table)
long <- melt(setDT(df), id.vars = c("ID"), variable.name = "time")
print(long)
ID time value
1: 1 Item1_time1 4
2: 2 Item1_time1 4
3: 3 Item1_time1 5
4: 4 Item1_time1 4
5: 5 Item1_time1 3
6: 1 Item2_time1 3
7: 2 Item2_time1 4
8: 3 Item2_time1 3
9: 4 Item2_time1 5
10: 5 Item2_time1 3
11: 1 Item1_time2 5
12: 2 Item1_time2 4
13: 3 Item1_time2 4
...
How can I code “collapsed” items so that they are not listed separately, but each in its corresponding time category (see desired output above)?
We can use
pivot_longer
withnames_pattern
to capture the substring ((...)
) from the column names to create the.value
column and the 'time' column-output
The
(\\w+)
- captures one or more characters followed by the_
in the column name, and then any non-digits (\\D+
) followed by the second capture group of one or more digits ((\\d+)
), which corresponds to the.value
(column values) and 'time' gets the digits suffix from the column name