I am trying to convert a wide dataframe that contains the following columns:
id, age, gender, AAA1, value11, AAA2, value12, ..., BBB1, value21, BBB2, value22, ..., CCC1, value31, CCC2, value32,...
dput
df2.id df2.dummy_age df2.dummy_gender df2.SVO1 df2.trip11 df2.SVO2 df2.trip12 df2.SVA1 df2.trip21 df2.SVA2 df2.trip22 df2.OVA1 df2.trip31 df2.OVA2 df2.trip32
1 1 0 0 2 123 1 146 2 134 2 143 2 143 2 128
2 2 0 0 2 128 2 155 2 134 2 143 2 143 2 158
3 3 0 0 1 152 1 161 2 134 2 143 2 143 2 128
4 4 0 0 2 128 2 143 2 149 2 149 2 128 2 128
5 5 0 0 2 134 1 120 2 134 2 143 2 155 1 149
6 6 0 1 1 146 2 155 2 134 1 112 2 143 1 155
7 7 0 0 2 149 2 123 2 143 2 143 1 123 2 134
8 8 0 1 1 128 1 149 1 140 2 143 2 134 1 155
9 9 1 1 2 129 2 113 2 149 2 128 2 134 2 134
10 10 0 0 2 128 1 137 2 149 2 143 2 143 2 134
to
id, age, gender,
type = AAA, BBB, CCC, ...
stage = * where AAA*, BBB*, CCC*, ...
period = ** where value**
value = actual values under value11, value12, ...., value21, value22, ...
Example for id = 1
id dummy_age dummy_gender type stage period value
1 1 0 0 SVO 1 11 123
2 1 0 0 SVO 2 12 146
3 1 0 0 SVA 1 21 134
4 1 0 0 SVA 2 22 143
5 1 0 0 OVA 1 31 143
6 1 0 0 OVA 2 32 128
I can't seem to find a good way to use dplyr to do this. Appreciate any help on this.
Without any data, its hard to know what the format of the dataframe really is. With that being said, I imagen that
XXX##is just a string of that name? Given your desired output, it looks like all the type info is held within thevalue##column names, so we can probably just drop theXXX##columns,pivot_longerand derive the type data after.