I am trying to convert a wide dataframe that contains the following columns:

id, age, gender, AAA1, value11, AAA2, value12, ..., BBB1, value21, BBB2, value22, ..., CCC1, value31, CCC2, value32,...

dput

   df2.id df2.dummy_age df2.dummy_gender df2.SVO1 df2.trip11 df2.SVO2 df2.trip12 df2.SVA1 df2.trip21 df2.SVA2 df2.trip22 df2.OVA1 df2.trip31 df2.OVA2 df2.trip32
1       1             0                0        2        123        1        146        2        134        2        143        2        143        2        128
2       2             0                0        2        128        2        155        2        134        2        143        2        143        2        158
3       3             0                0        1        152        1        161        2        134        2        143        2        143        2        128
4       4             0                0        2        128        2        143        2        149        2        149        2        128        2        128
5       5             0                0        2        134        1        120        2        134        2        143        2        155        1        149
6       6             0                1        1        146        2        155        2        134        1        112        2        143        1        155
7       7             0                0        2        149        2        123        2        143        2        143        1        123        2        134
8       8             0                1        1        128        1        149        1        140        2        143        2        134        1        155
9       9             1                1        2        129        2        113        2        149        2        128        2        134        2        134
10     10             0                0        2        128        1        137        2        149        2        143        2        143        2        134

to

id, age, gender,
type = AAA, BBB, CCC, ...
stage = * where AAA*, BBB*, CCC*, ...
period = ** where value**
value = actual values under value11, value12, ...., value21, value22, ...

Example for id = 1

  id dummy_age dummy_gender type stage period value
1  1         0            0  SVO     1     11   123
2  1         0            0  SVO     2     12   146
3  1         0            0  SVA     1     21   134
4  1         0            0  SVA     2     22   143
5  1         0            0  OVA     1     31   143
6  1         0            0  OVA     2     32   128

I can't seem to find a good way to use dplyr to do this. Appreciate any help on this.

1

There are 1 best solutions below

1
AndS. On

Without any data, its hard to know what the format of the dataframe really is. With that being said, I imagen that XXX## is just a string of that name? Given your desired output, it looks like all the type info is held within the value## column names, so we can probably just drop the XXX## columns, pivot_longer and derive the type data after.

library(tidyverse)

#test data
set.seed(25)
example_data <- tibble(id = letters[1:5],
       age = 21:25,
       gender = "m",
       AAA1 = "AAA1",
       value11 = runif(5),
       AAA2 = "AAA2",
       value12 = runif(5),
       BBB1 = "BBB1",
       value21 = runif(5),
       BBB2 = "BBB2",
       value22 = runif(5),
       )

example_data
#> # A tibble: 5 x 11
#>   id      age gender AAA1  value11 AAA2  value12 BBB1  value21 BBB2  value22
#>   <chr> <int> <chr>  <chr>   <dbl> <chr>   <dbl> <chr>   <dbl> <chr>   <dbl>
#> 1 a        21 m      AAA1    0.416 AAA2   0.985  BBB1    0.328 BBB2    0.148
#> 2 b        22 m      AAA1    0.695 AAA2   0.626  BBB1    0.364 BBB2    0.540
#> 3 c        23 m      AAA1    0.149 AAA2   0.338  BBB1    0.959 BBB2    0.728
#> 4 d        24 m      AAA1    0.897 AAA2   0.0668 BBB1    0.589 BBB2    0.477
#> 5 e        25 m      AAA1    0.124 AAA2   0.282  BBB1    0.695 BBB2    0.729


#solution
example_data |>
  select(id, age, gender, contains("value")) |>
  pivot_longer(cols = contains("value"), 
               names_to = c("stage", "period"), 
               values_to = "value", 
               names_pattern = "value(.)(.)", 
               names_transform = as.numeric) |>
  mutate(type = map_chr(stage, \(x) paste(rep(LETTERS[x], 3), collapse = "" )), 
         .after = gender)
#> # A tibble: 20 x 7
#>    id      age gender type  stage period  value
#>    <chr> <int> <chr>  <chr> <dbl>  <dbl>  <dbl>
#>  1 a        21 m      AAA       1      1 0.416 
#>  2 a        21 m      AAA       1      2 0.985 
#>  3 a        21 m      BBB       2      1 0.328 
#>  4 a        21 m      BBB       2      2 0.148 
#>  5 b        22 m      AAA       1      1 0.695 
#>  6 b        22 m      AAA       1      2 0.626 
#>  7 b        22 m      BBB       2      1 0.364 
#>  8 b        22 m      BBB       2      2 0.540 
#>  9 c        23 m      AAA       1      1 0.149 
#> 10 c        23 m      AAA       1      2 0.338 
#> 11 c        23 m      BBB       2      1 0.959 
#> 12 c        23 m      BBB       2      2 0.728 
#> 13 d        24 m      AAA       1      1 0.897 
#> 14 d        24 m      AAA       1      2 0.0668
#> 15 d        24 m      BBB       2      1 0.589 
#> 16 d        24 m      BBB       2      2 0.477 
#> 17 e        25 m      AAA       1      1 0.124 
#> 18 e        25 m      AAA       1      2 0.282 
#> 19 e        25 m      BBB       2      1 0.695 
#> 20 e        25 m      BBB       2      2 0.729