Transforming Dataset from Tall to Wide and Adding Additional Rows

19 Views Asked by At

I have a data frame that contains the following columns: STU_ID, INCODE, F4SECT, PSE94FLG, pse_case_count, INSTNUM, and MAJOR.

The data frame is currently in tall format and I need to convert it to wide format so that the following happens:

  1. There should be one row per STU_ID, as the tall format has duplicate student ids as they selected different college majors.
  2. A new colum variable is created that iterates over the values of INSTNUM, so that INSTUNM is an index value. The new columns created should be based on INCODE and MAJOR, so for example if there are three rows for STU_ID, there should be values for the new columns of INCODE1, INCODE2, INCODE3, etc. and MAJOR1, MAJOR2, MAJOR3, etc.

The final data frame should have one row per STU_ID, the appropriate number of INCODE values - for example a student attended 2 separate universities for 2 different majors - and the appropriate number of MAJOR values - for example the student that attended or changed majors the 1st major is listed in one column, the 2nd major is listed in another column.

What I've tried

library(tidyverse)

test_merged_pse_data <- merged_pse_data %>% pivot_wider(names_from = INSTNUM, values_from = c(MAJOR, INCODE))

The problem I'm experiencing is this hasn't combined the same STU_ID, so, for example, I get a student who has had three different majors showing up multiple times on separate rows as opposed to the table showing the following:

STU_ID | F4SECT | pse_case_count | MAJOR1 | MAJOR2 | MAJOR3 | INCODE1 | INCODE2 | INCODE3| 123456 | 4 | 3 | PHARM | UNDEC | EDU | 2202 | 2320 | 2324 |

0

There are 0 best solutions below