group unique rows in a dataframe by ID, and stack related values as new column

621 Views Asked by At

I'm trying to transform a data frame in which individuals are grouped by family ID. Instead of:

FAMID1 ID1  Age  Sex  Q1  Q2 ......
21     1    18   M    T   4
21     2    21   F    F   2
22     1    13   F    T   3
22     2    16   F    F   1
23     1    18   M    T   3
23     2    18   M    T   3

I want to get:

FAMID  ID1  ID2  Age1  Age2  Sex1  Sex2 Q1.1. Q1.2....
21     1    2    18    21     M     F    T     F
22     1    2    13    16     F     F    T     F
23     1    2    18    18     M     M    T     T

I've tried using melt (reshape2) and mutate (ddply) but I have a few hundred variables and I can't get them all into the same line of the data frame. I don't want to sum variables like aggregate, I just want to stack one line in with another and change all of the variable names to reflect which member of the family it is.

2

There are 2 best solutions below

2
On
reshape(direction = "wide", data=df, idvar = c("FAMID1"), 
        v.names = c("ID","Age","Sex","Q1","Q2"), timevar = "ID")

#   FAMID1 ID.1 Age.1 Sex.1 Q1.1 Q2.1 ID.2 Age.2 Sex.2  Q1.2 Q2.2
#1:     21    1    18     M TRUE    4    2    21     F FALSE    2
#2:     22    1    13     F TRUE    3    2    16     F FALSE    1
#3:     23    1    18     M TRUE    3    2    18     M  TRUE    3
0
On

You can achieve this using data.table long to wide. various other examples are listed here.

library(data.table)
setDT(df)

if you have many columns, you can just do this

colnames(df)
cols<-colnames(df)[-1]
dcast(df, FAMID1 ~ ID1, value.var = cols)

output

   FAMID1 ID1.1_1 ID1.1_2 Age_1 Age_2 Sex_1 Sex_2  Q1_1  Q1_2  Q2_1 Q2_2
 1:   21     1       2      18    21     M     F   TRUE FALSE    4    2
 2:   22     1       2      13    16     F     F   TRUE FALSE    3    1
 3:   23     1       2      18    18     M     M   TRUE  TRUE    3    3