Longitudinal dataset - difference between two dates

324 Views Asked by At

I have a longitudinal dataset that I imported in R from Excel that looks like this:

STUDYID  VISIT#  VISITDate 
1         1      2012-12-19
1         2      2018-09-19
2         1      2013-04-03
2         2      2014-05-14
2         3      2016-05-12

In this dataset, each patient/study ID has a different number of visits to the hospital, and their first visit dates which is likely to differ from individual to individual. I want to create a new time variable which is essentially time in years since first visit, so the dataset will look like this:

STUDYID  VISIT#  VISITDate   Time(years)
1         1      2012-12-19    0
1         2      2018-09-19    5
2         1      2013-04-03    0
2         2      2014-05-14    1
2         3      2016-05-12    3

The reason for creating a time variable like this is to assess differential regression effects over time (which is a continuous variable). Is there any way to create a new time variable like this in R so I can use it as an independent variable in my regression analyses?

enter image description here

2

There are 2 best solutions below

2
On BEST ANSWER

Consider ave to calculate the minimum of VISITDate by STUDYID group, then take the date difference with conversion to integer years:

df <- within(df, {
        minVISITDate <- ave(VISITDate, STUDYID, FUN=min)
        Time <- floor(as.double(difftime(VISITDate, minVISITDate, unit="days") / 365))
        rm(minVISITDate)
    })

df
#  STUDYID VISIT#  VISITDate Time
# 1      1      1 2012-12-19    0 
# 2      1      2 2018-09-19    5 
# 3      2      1 2013-04-03    0
# 4      2      2 2014-05-14    1 
# 5      2      3 2016-05-12    3
2
On

Loading up packages:

library(tibble)
library(dplyr)
library(lubridate)

Setting up the data:

dat <- tribble(~STUDYID , ~VISIT , ~VISITDate ,
               1       ,  1    ,  "2012-12-19",
               1       ,  2    ,  "2018-09-19",
               2       ,  1    ,  "2013-04-03",
               2       ,  2    ,  "2014-05-14",
               2       ,  3    ,  "2016-05-12") %>% 
  mutate(VISITDate = as.Date(VISITDate)) 

Creating the wanted variable:

dat %>% 
  group_by(STUDYID) %>% 
  mutate(Time = first(VISITDate) %--% VISITDate,
         Time = as.numeric(Time, "years")) %>% 
  ungroup()

# A tibble: 5 x 4
  STUDYID VISIT VISITDate   Time
    <dbl> <dbl> <date>     <dbl>
1       1     1 2012-12-19  0   
2       1     2 2018-09-19  5.75
3       2     1 2013-04-03  0   
4       2     2 2014-05-14  1.11
5       2     3 2016-05-12  3.11