Time difference calculated from wide data with missing rows

Question

Time difference calculated from wide data with missing rows

473 Views Asked by T Richard At 27 June 2025 at 12:32

There is a longitudinal data set in the wide format, from which I want to compute time (in years and days) between the first observation date and the last date an individual was observed. Dates are in the format yyyy-mm-dd. The data set has four observation periods with missing dates, an example is as follows

df1<-data.frame("id"=c(1:4),
           "adate"=c("2011-06-18","2011-06-18","2011-04-09","2011-05-20"),
           "bdate"=c("2012-06-15","2012-06-15",NA,"2012-05-23"),
           "cdate"=c("2013-06-18","2013-06-18","2013-04-09",NA),
           "ddate"=c("2014-06-15",NA,"2014-04-11",NA))

Here "adate" is the first date and the last date is the date an individual was last seen. To compute the time difference (lastdate-adate), I have tried using "lubridate" package, for example

lubridate::time_length(difftime(as.Date("2012-05-23"), as.Date("2011-05-20")),"years")

However, I'm challenged by the fact that the last date is not coming from one column. I'm looking for a way to automate the calculation in R. The expected output would look like

  id years days
1  1  2.99 1093
2  2  2.00  731
3  3  3.01 1098
4  4  1.01  369

Years is approximated to 2 decimal places.

Original Q&A

There are 4 best solutions below

akrun On 31 July 2020 at 22:10

We could use pmap

library(dplyr)
library(purrr)
library(tidyr)
df1 %>%
    mutate(out = pmap(.[-1], ~ {
      dates <- as.Date(na.omit(c(...)))
      tibble(years = lubridate::time_length(difftime(last(dates), 
            first(dates)), "years"), 
       days = lubridate::time_length(difftime(last(dates), first(dates)), "days"))
           })) %>% 
   unnest_wider(out)
# A tibble: 4 x 7
#     id adate      bdate      cdate      ddate      years  days
#  <int> <chr>      <chr>      <chr>      <chr>      <dbl> <dbl>
#1     1 2011-06-18 2012-06-15 2013-06-18 2014-06-15  2.99  1093
#2     2 2011-06-18 2012-06-15 2013-06-18 <NA>        2.00   731
#3     3 2011-04-09 <NA>       2013-04-09 2014-04-11  3.01  1098
#4     4 2011-05-20 2012-05-23 <NA>       <NA>        1.01   369

Ronak Shah On 01 August 2020 at 01:13

Using base R apply :

df1[-1] <- lapply(df1[-1], as.Date)

df1[c('years', 'days')] <- t(apply(df1[-1], 1, function(x) {
      x <- na.omit(x)
      x1 <- difftime(x[length(x)], x[1], 'days')
      c(x1/365, x1)
}))

df1[c('id', 'years', 'days')]
#  id    years days
#1  1 2.994521 1093
#2  2 2.002740  731
#3  3 3.008219 1098
#4  4 1.010959  369

Onyambu On 31 July 2020 at 22:50

Probably most of the functions introduced here might be quite complex. You should try to learn them if possible. Although will provide a Base R approach:

grp <- droplevels(interaction(df[,1],row(df[-1]))) # Create a grouping:

days <- tapply(unlist(df[-1]),grp, function(x)max(x,na.rm = TRUE) - x[1]) #Get the difference

cbind(df[1],days, years = round(days/365,2)) # Create your table

    id days years
1.1  1 1093  2.99
2.2  2  731  2.00
3.3  3 1098  3.01
4.4  4  369  1.01

if comfortable with other higher functions then you could do:

dat <- aggregate(adate~id,reshape(df1,list(2:ncol(df1)), dir="long"),function(x)max(x) - x[1])
transform(dat,year = round(adate/365,2))
  id adate  year
1  1 1093  2.99 
2  2  731  2.00 
3  3 1098  3.01 
4  4  369  1.01

**M--** · Accepted Answer

Another tidyverse solution can be done by converting the data to long format, removing NA dates, and getting the time difference between last and first date for each id.

library(dplyr)
library(tidyr)
library(lubridate)

df1 %>% 
  pivot_longer(-id) %>% 
  na.omit %>% 
  group_by(id) %>% 
  mutate(value = as.Date(value)) %>% 
  summarise(years = time_length(difftime(last(value), first(value)),"years"),
            days = as.numeric(difftime(last(value), first(value))))

#> # A tibble: 4 x 3
#>      id years  days
#>   <int> <dbl> <dbl>
#> 1     1  2.99  1093
#> 2     2  2.00   731
#> 3     3  3.01  1098
#> 4     4  1.01   369

Time difference calculated from wide data with missing rows

There are 4 best solutions below

Related Questions in R

Related Questions in DATAFRAME

Related Questions in DATE

Related Questions in LUBRIDATE

Related Questions in DIFFTIME

Trending Questions

Popular # Hahtags

Popular Questions