I have a long-formatted set of longitudinal data with two variables that change over time (cognitive_score and motor_score), subject id (labeled subjid) and age of each subject in days at the moment of measurement(labeled agedays). Measurements were taken twice.

I want to transform it to wide-formatted longitudinal dataset.

The problem is that agedays measurements are unique for each subject, and the only way to see which measurement entry was the first, and which was the second, is to check where the agedays is higher (agedays higher than in the other entry means second measurement, lower agedays means first measurement).

We thus have this dataset:

    subjid agedays cognitive_score motor_score
        <int>   <int>           <dbl>       <dbl>
    1 4900001     457           0.338       0.176
    2 4900001    1035           0.191       0.216
    3 4900002     639           0.25        0.176
    4 4900002    1248           0.176       0.353
    5 4900003     335           0.103       0.196
    6 4900003     913           0.176       0.196

And what I tried was using reshape:

reshape(dataset_col, direction = "wide", idvar = "subjid", timevar = "agedays", v.names = c("cognitive_score", "motor_score"))

Where dataset_col is the name of the dataset.

What it does, however, is adding these two columns:

result of the above-listed command execution

The numbers in the name of the columns seem to be the values of agedays variable.

Any advice on how I can do this?

1

There are 1 best solutions below

0
On

With libraries dplyr and tidyr:

You can use group_by and mutate to determine which of the ids is the first measurement and which is the second measurement. In this case I used an ifelse statement to determine which measurement is the first, this only works if you have exactly 2 measurements for each subjid.

Then you can use pivot_wider to pivot your data to wide format based on your newly created names column.

library(dplyr)
library(tidyr)
df %>%
  group_by(subjid) %>%
  mutate(names = ifelse(agedays == min(agedays),"first","second")) %>% 
  pivot_wider(names_from = names, values_from = c(agedays,cognitive_score,motor_score))

# A tibble: 3 × 7
# Groups:   subjid [3]
#   subjid agedays_first agedays_second cognitive_score_first cognitive_score_second motor_score_first motor_score_second
#    <int>         <int>          <int>                 <dbl>                  <dbl>             <dbl>              <dbl>
# 1 4900001           457           1035                 0.338                  0.191             0.176              0.216
# 2 4900002           639           1248                 0.25                   0.176             0.176              0.353
# 3 4900003           335            913                 0.103                  0.176             0.196              0.196