I have millions of docs containing start and end times of something they're measuring.
[
{
"start" : ISODate("2024-01-01T00:00:00"),
"end" : ISODate("2024-01-01T01:00:00")
},
{
"start" : ISODate("2024-01-01T01:00:00"),
"end" : ISODate("2024-01-01T02:00:00")
},
{
"start" : ISODate("2024-01-01T03:00:00"),
"end" : ISODate("2024-01-01T04:00:00")
},
]
I would like a way of generating a result set containing only the non-contiguous blocks. For this example, that would look like:
[
{
"start" : ISODate("2024-01-01T00:00:00"),
"end" : ISODate("2024-01-01T02:00:00")
},
{
"start" : ISODate("2024-01-01T03:00:00"),
"end" : ISODate("2024-01-01T04:00:00")
},
]
I'm experienced with using aggregate and $group, but I can't think of a way to acheive what I need.
Any suggestions would be much appreciated.
You can achieve the expected behaviour through 2
$setWindowFields. For the first$setWindowFields, you can find the$maxendfield for each document in the document range of["unbounded", -1]that is sorted by{start: 1}. i.e. For a current document, you will search in the document range that has astartsmaller than the current document and get the maxend. We call this valueprevMax.Then, by comparing the
prevMaxwith current document'sstart. You can determine if it should be start of a group. We call this booleanisStart.After that, by using another
$setWindowFields, you can conditionally find thegroupingof all documents throughisStartand in the document range of[current, unbounded]Finally, using the
groupingfield to find$maxendinside the grouping.Mongo Playground