Calculate animal coresidence times in R, i.e. dyadic overlapping date ranges

122 Views Asked by At

My data are individuals in an animal group over time (individuals are born, individuals die), so all individuals overlap with others for different lengths of time. Column A is the individual identity, column B is the "start date", and column "C" is the "end date".

I want to create a table or matrix (latter probably easier to read), that displays the amount of time all pairs of individuals were alive/together in the group at the same time. I would like to do this in R.

Example data:

ID  start.date  end.date
1   5/22/83     10/31/15
2   7/10/94     3/15/15
3   5/24/96     10/31/15
4   10/1/99     5/12/14

Example output (numbers represent approximate years of overlap):

    1   2   3   4
1   NA  21  19  15
2   NA  NA  19  15
3   NA  NA  NA  15
4   NA  NA  NA  NA

Although I have a specific question in mind (animal coresidence times), the solution/method could be used to calculate dyadic durations for any type of overlapping date ranges.

Some superficially similar work has been pointed out here, with the foverlaps function, but this function and all related documentation of similar problems that I've seen in other questions appears to deal with issues involving two data tables. The underlying concept is admittedly similar, finding dates in common among different data, but I don't see how I would code something to solve my question using the foverlaps function (finding commong ranges among all possible sets of individuals, within a single table). I thought about doing some sort of repetitive loop, but this would be cumbersome and gets more difficult as the data table gets larger.

1

There are 1 best solutions below

0
On

foverlaps() is not required. Instead, each ID is compared to each other using a non-equi self-join (similar to what combn() does) and the years of overlap are computed using pmin() and pmax().

library(data.table)
# add dummy ID column to join on for non-equi join
DT[, join.ID := ID][
  # non-equi join to create combinations
  DT, on = .(join.ID >= join.ID)][
    # compute years of overlap
    , overlap.years := round(as.integer(
      pmin(end.date, i.end.date) - pmax(start.date, i.start.date)) / 365.25)][
        # remove negative values (no overlap)
        overlap.years > 0][
          # reshape from long to wide format
          , dcast(.SD, i.ID ~ ID)]
   i.ID  1  2  3  4
1:    1 32 21 19 15
2:    2 NA 21 19 15
3:    3 NA NA 19 15
4:    4 NA NA NA 15

Note that there is a difference to OP's expected result. The main diagonal is included which contains the age of each individual. I believe this is a valuable information when comparing coresidence times.

Data

library(data.table)
DT <- fread(
  "ID  start.date  end.date
1   5/22/83     10/31/15
2   7/10/94     3/15/15
3   5/24/96     10/31/15
4   10/1/99     5/12/14"
)
# convert date string to class Date
cols <- c("start.date", "end.date")
DT[, (cols) := lapply(.SD, lubridate::mdy), .SDcols = cols]

Multiple periods of coresidence

In case an individual leaves the group and returns later, above code needs to be modified:

# read dat of new case
DT2 <- fread(
  "ID  start.date  end.date
1   5/22/83     10/31/15
2   7/10/94     3/15/15
3   5/24/96     10/31/15
4   10/1/99     5/12/14
4   3/20/15     5/12/16"
)
cols <- c("start.date", "end.date")
DT2[, (cols) := lapply(.SD, lubridate::mdy), .SDcols = cols]
DT2

Note that individual 4 has left the group for 10 months.

DT2[, join.ID := ID][
  DT2, on = .(join.ID >= join.ID), allow = TRUE][
    , overlap.years := as.integer(
      pmin(end.date, i.end.date) - pmax(start.date, i.start.date)) / 365.25][
        overlap.years > 0][
        , dcast(.SD, i.ID ~ ID, function(x) round(sum(x), 1), fill = NA)]
   i.ID    1    2    3    4
1:    1 32.4 20.7 19.4 15.2
2:    2   NA 20.7 18.8 14.6
3:    3   NA   NA 19.4 15.2
4:    4   NA   NA   NA 15.8

Note that the main diagonal no longer denotes age but total duration of affiliation of an individual with the group.