Listing pairwise overlaps of Date time elements in R

73 Views Asked by At

I have a list of Lectures for a university course stored in a data-frame. This is a large complex table with over 1000 rows. I have used simple time in the example, but this is actually date time in the format %d %b %Y %H:%M. I think I should be able to extrapolate to the more complex usage.

essentially:

ModuleCode1 ModuleName Lecturer StartTime   EndTime  Course
   11A       Hist1       Bob      10:30     12:30   Hist
   13A       Hist2       Bob      14:30     15:30   Hist
   13C       Hist3       Steve    11:45     12:45   Hist
   15B       Hist4       Bob      09:40     10:40   Hist
   17B       Hist5       Bob      14:00     15:00   Hist

I am trying to create an output data frame which determines which modules clash in the timetable and at which times. For example:

ModuleCode1  StartTime  EndTime  ModuleCode2 StartTime EndTime
   11A         10:30     12:30      15B         09:40   10:40
   11A         10:30     12:30      13C         11:45   12:45
   13A         10:30     12:30      17B         14:00   15:00

There are a multitude of questions on date time overlaps, but the ones that I can find seem to either work with 2 dataframes, or I can't understand them. I have come across the lubridate and IRanges packages, but cannot work out this specific implementation with date time in a single data frame. It seems as though something which would be generally useful, and most likely would have a simple implementation I am missing. Grateful for any help.

1

There are 1 best solutions below

1
On BEST ANSWER

Here is an sqldf solution. The intervals do NOT overlap iff a.StartTime > b.EndTime or a.EndTime < b.StartTime so they do overlap exactly when the negation of this statement is true, hence:

library(sqldf)
sqldf("select a.ModuleCode1, a.StartTime, a.EndTime, b.ModuleCode1, b.StartTime, b.EndTime
       from DF a join DF b on a.ModuleCode1 < b.ModuleCode1 and
                              a.StartTime <= b.EndTime and 
                              a.EndTime >= b.StartTime")

giving:

  ModuleCode1 StartTime EndTime ModuleCode1 StartTime EndTime
1         11A     10:30   12:30         13C     11:45   12:45
2         11A     10:30   12:30         15B     09:40   10:40
3         13A     14:30   15:30         17B     14:00   15:00

Note: The input in reproducible form is:

Lines <- "ModuleCode1 ModuleName Lecturer StartTime   EndTime  Course
   11A       Hist1       Bob      10:30     12:30   Hist
   13A       Hist2       Bob      14:30     15:30   Hist
   13C       Hist3       Steve    11:45     12:45   Hist
   15B       Hist4       Bob      09:40     10:40   Hist
   17B       Hist5       Bob      14:00     15:00   Hist"

DF <- read.table(text = Lines,  header = TRUE)