I have a dataframe like the one below in R:
### Packages
library(tidyverse)
library(Epi)
library(survival)
library(lubridate)
### Create data:
End_Date <- as.Date("1968-01-01") + days(sample (c(250:365), size=500, replace =T))
Example_DF <- as.data.frame(End_Date)
Example_DF$Start_Date <- as.Date("1968-01-01")
Example_DF$Exposure <- Example_DF$End_Date - days(sample (c(1:249), size=500, replace =T))
Example_DF$ID <- seq(1,500,1)
What I want do is to for each calendar month from 1968-01 until and including 1969-05 create two new columns per calendar month that is summing up the number of days of person-time each person (ID) is providing as unexposed and exposed, respectively. These columns can for example be called 1968_01_Unexposed, 1968_01_Exposed etc.
The date of exposure is found in the column Exposure. What I want in the end is thus a dataframe with 41 columns (4 in the original dataframe plus 34 columns (2 per 17 calendar month between 1968-01 and 1969-05)). For example ID 1 is having 31 person days as unexposed for 1968-01, 0 days as exposed for 1968-01 etc until 1968-07, where ID 1 has 10 days of unexposed and 21 days as exposed.
Anyone knows how this can be done in a convenient way?
The following should get you going. In fact, you have developed part of the "algorithm" already yourself with the last para of your problem description.
Working with
{tidyverse}
andtibbles
/data frames
try to think in vectors/columns before presenting the result in a more human-readablewide
-way.I demonstrate the initial part of how you can go about it with your first 2 entries and solving the logical conditions on the number of days.
I leave it to you to apply this approach then to the exposed days and read up on
{tidyr}
'spivot_wider()
to spread your results across the desired columns.While you provide some sample data and thus a reproducible example, the sample seems not to operate on 17 months. I did not check the example for further consistency.
This yields:
You should be able to construct the required number of days for the exposed case.
Then read up on
{tidyr}
andpivot_longer
to spread your long table to a wide format that you want to have.