I am trying to combine 2 datasets of unequal length in a similar way to the example below for a longitudinal study. Dataset 1 includes each participant only once, with the row of data from their first weekly survey. Dataset 2 includes all surveys from all participants. I am trying to create a third dataset that accounts for missing weekly surveys. For example, if participant 2 missed their survey on the 17th of Jan, it will still show week 2, participant id and date with the rest of the cols blank. Any ideas on how to accomplish this are much appreciated as I am very new to R.
#dataframe 1 (many more value cols)
ID date value Weeknumber
1 March 1 8 1
2 Jan 10 9 1
3 April 12 12 1
4 Dec 9 6 1
#Dataframe 2
ID date value
1 March 1 8
1 March 8 3
1 March 15 9
1 March 22 11
1 March 29 5
2 Jan 10 9
2 Jan 24 5
2 Jan 31 12
2 Feb 7 7
3 April 12 12
3 April 19 3
3 April 26 10
3 May 2 6
4 Dec 9 6
4 Dec 30 7
4 Jan 6 11
#Desired output:
ID Date Value Week number
1 March 1 8 1
1 March 8 3 2
1 March 15 9 3
1 March 22 11 4
1 March 29 5 5
2 Jan 10 9 1
2 Jan 17 2
2 Jan 24 5 3
2 Jan 31 12 4
2 Feb 7 7 5
3 April 12 12 1
3 April 19 3 2
3 April 26 10 3
3 May 2 6 4
3 May 9 5
4 Dec 9 6 1
4 Dec 16 2
4 Dec 23 3
4 Dec 30 7 4
4 Jan 6 11 5
Here is another approach to consider using
tidyverse
.First, would consider including years for your dates. If you include year, then you can account for leap years in determining dates of missing weeks more accurately. As you mention being very new to R, let me know if want me to add details on converting the dates.
Next, selecting
ID
anddate
from your first data framedf1
, you cangroup_by
ID
, where subsequent procedures are done within eachID
. Usingmutate
andmap
you can add rows with a sequence of 5 weeks starting with the originaldate
.After that, you can merge with
left_join
the other data framedf2
. The missing weeks will haveNA
forvalue
. Finally, we can add therow_number()
within eachID
to be theWeeknumber
.One other final concern noticed with the example date, the dates April 26 and May 2 are only 6 days apart. The join would miss this if not exactly one week. There could be alternative approaches if the dates are not exactly one week apart.
Output