I have a long-formatted set of longitudinal data with two variables that change over time (cognitive_score and motor_score), subject id (labeled subjid) and age of each subject in days at the moment of measurement(labeled agedays). Measurements were taken twice.
I want to transform it to wide-formatted longitudinal dataset.
The problem is that agedays measurements are unique for each subject, and the only way to see which measurement entry was the first, and which was the second, is to check where the agedays is higher (agedays higher than in the other entry means second measurement, lower agedays means first measurement).
We thus have this dataset:
subjid agedays cognitive_score motor_score
<int> <int> <dbl> <dbl>
1 4900001 457 0.338 0.176
2 4900001 1035 0.191 0.216
3 4900002 639 0.25 0.176
4 4900002 1248 0.176 0.353
5 4900003 335 0.103 0.196
6 4900003 913 0.176 0.196
And what I tried was using reshape:
reshape(dataset_col, direction = "wide", idvar = "subjid", timevar = "agedays", v.names = c("cognitive_score", "motor_score"))
Where dataset_col is the name of the dataset.
What it does, however, is adding these two columns:
The numbers in the name of the columns seem to be the values of agedays variable.
Any advice on how I can do this?

With libraries
dplyrandtidyr:You can use
group_byandmutateto determine which of the ids is the first measurement and which is the second measurement. In this case I used anifelsestatement to determine which measurement is the first, this only works if you have exactly 2 measurements for each subjid.Then you can use
pivot_widerto pivot your data to wide format based on your newly created names column.