R: How to recode a wide data frame with repeating variables (time) into to long data?

76 Views Asked by At

My data frame consists of several hundred persons (rows) answering 4 * 90 items each. It’s the same 90 questions across four points in time per person. Each person belongs to one of two groups. I checked for measurement invariance across these groups, this is easy with multigroup SEM and data in the wide format. Here is a shortened, reproducible example of my initial data frame:

df <- data.frame (ID = c(1,2,3,4,5),
                  Item1_time1 = c(4,4,5,4,3),
                  Item2_time1 = c(3,4,3,5,3),
                  Item1_time2 = c(5,4,4,5,4),
                  Item2_time2 = c(3,3,4,3,5),
                  group = c(0,1,0,1,0)
)

print(df)
  ID Item1_time1 Item2_time1 Item1_time2 Item2_time2 group
1  1           4           3           5           3     0
2  2           4           4           4           3     1
3  3           5           3           4           4     0
4  4           4           5           5           3     1
5  5           3           3           4           5     0

However, now I want to check for measurement invariance across points in time. To do this, this is what the desired output should look like:

ID  time   Item1   Item2    group
1    1       4       3        0
1    2       5       3        0
2    1       4       4        1
2    2       4       3        1
3    1       5       3        0
3    2       4       4        0
4    1       4       5        1
4    2       5       3        1
5    1       3       3        0
5    2       4       5        0

In order to get this, I tried to convert my initial wide to long data, but I don’t really get the desired output:

library(data.table)
long <- melt(setDT(df), id.vars = c("ID"), variable.name = "time")
print(long)

    ID        time value
 1:  1 Item1_time1     4
 2:  2 Item1_time1     4
 3:  3 Item1_time1     5
 4:  4 Item1_time1     4
 5:  5 Item1_time1     3
 6:  1 Item2_time1     3
 7:  2 Item2_time1     4
 8:  3 Item2_time1     3
 9:  4 Item2_time1     5
10:  5 Item2_time1     3
11:  1 Item1_time2     5
12:  2 Item1_time2     4
13:  3 Item1_time2     4
...

How can I code “collapsed” items so that they are not listed separately, but each in its corresponding time category (see desired output above)?

2

There are 2 best solutions below

0
On BEST ANSWER

We can use pivot_longer with names_pattern to capture the substring ((...)) from the column names to create the .value column and the 'time' column

library(tidyr)
pivot_longer(df, cols = -c(ID, group), names_to = c(".value", "time") , 
      names_pattern = "(\\w+)_\\D+(\\d+)")

-output

# A tibble: 10 × 5
      ID group time  Item1 Item2
   <dbl> <dbl> <chr> <dbl> <dbl>
 1     1     0 1         4     3
 2     1     0 2         5     3
 3     2     1 1         4     4
 4     2     1 2         4     3
 5     3     0 1         5     3
 6     3     0 2         4     4
 7     4     1 1         4     5
 8     4     1 2         5     3
 9     5     0 1         3     3
10     5     0 2         4     5

The (\\w+) - captures one or more characters followed by the _ in the column name, and then any non-digits (\\D+) followed by the second capture group of one or more digits ((\\d+)), which corresponds to the .value (column values) and 'time' gets the digits suffix from the column name

0
On

Like this?

setDT(df)
result <- melt(df, measure.vars=patterns(Item1='Item1', Item2='Item2')   
                 , variable.name = 'time')
setorder(result, ID, time, group)
result     
##     ID group time Item1 Item2
##  1:  1     0    1     4     3
##  2:  1     0    2     5     3
##  3:  2     1    1     4     4
##  4:  2     1    2     4     3
##  5:  3     0    1     5     3
##  6:  3     0    2     4     4
##  7:  4     1    1     4     5
##  8:  4     1    2     5     3
##  9:  5     0    1     3     3
## 10:  5     0    2     4     5