RethinkDB query based on multiple timestamps as secondary indices

136 Views Asked by At

I have a RethinkDB with a table 'events', each event has a 'from' and a 'to' column, both are timestamps.

I want to mainly query this table by returning all events that are "active" in a time window, meaning the from column is before the end of the time window and the end column is after the start of the time window (the start of the time window being the current time, and the end of the time window the current time + 2 hours). For this reason I have created a secondary index on both columns.

Ideally I would have liked to receive this as a single changefeed including any changes to the events in realtime and also selecting events that become active as time moves forward, but from what I can tell this is not possible since according to the docs "The command now() is computed once when the server receives the query" (correct me if I'm wrong), so I will resort to creating a new changefeed every hour that queries for active events in a 2 hour window from the current time.

Now I first thought I would use two between calls like so

let currentEndpoint = moment().add(2, 'hours');

// query for active events
r
  .table('events')
  // event start is before end of window
  .between(new Date(1), currentEndpoint, {index: 'from'})
  // event end is after current time
  .between(new Date(), r.epochTime(1900266776))
  .run(connection)
  .then(res => {
    console.log(res)
  })
  .catch(e => {
    console.error(e);
  })

Which apparently is not allowed: ReqlQueryLogicError: Cannot perform multiple BETWEENs on the same tabl

I understand how it would be possible to do this with the filter command quite easily, but since this is the main way I want to query the table I thought it would be better to use indices, is there a way to create this query?


Visual Example

                       event2
                          \
                      _____\______
                     /            \
time: ---->---->---->---->---->---->---->---->---->---->----> ...
            \__/             \__/        /         \__/
    event1___/        event3__/         /   event5__/
                                event4_/


                  ^              ^ 
query window:     |______________|

expected results: [ event2, event3 ]

Reproducible Problem

Create an events table -

r.tableCreate("events");
// { tables_created: 1, ... }

Add some events. For conceptual simplicity, we only concern ourselves with year, month, and day. Solving the problem for more granular time, such as hours, or minutes is effectively the same

r.table("events").insert([
  { name: "fishing tourney"
  , from: r.time(2020, 1, 11, "Z")
  , to: r.time(2020, 1, 12, "Z")
  }
, { name: "cherry blossom"
  , from: r.time(2020, 4, 1, "Z")
  , to: r.time(2020, 4, 10, "Z")
  }
, { name: "fishing tourney"
  , from: r.time(2020, 4, 11, "Z")
  , to: r.time(2020, 4, 12, "Z")
  }
, { name: "bunny day"
  , from: r.time(2020, 4, 1, "Z")
  , to: r.time(2020, 4, 12, "Z")
  }
, { name: "fishing tourney"
  , from: r.time(2020, 7, 11, "Z")
  , to: r.time(2020, 7, 12, "Z")
  }
]);
// { inserted: 5, generated_keys: [ ... ], ... }

Create some kind of secondary index -

r.table("events").indexCreate(
  "event_window",
  ???
);

Given a date range, get all events with an intersecting event window -

const start = r.time(2020, 4, 8, "Z");   // April 8, 2020
const end = start.add(3 * 24 * 60 * 60); // April 11, 2020, 3 days later

// filter it?
r.table("events").filter(???)...

// use .between somehow?
r.table("events").between(???, ???, { index: "event_window" })

// some other technique?
r.table("events").eqJoin(???)
r.table("events").???

Events overlapping April 8, 2020 through April 11, 2020 -

// expected output
[ { name: "cherry blossom"
  , from: r.time(2020, 4, 1, "Z")
  , to: r.time(2020, 4, 10, "Z")
  }
, { name: "fishing tourney"
  , from: r.time(2020, 4, 11, "Z")
  , to: r.time(2020, 4, 12, "Z")
  }
]
2

There are 2 best solutions below

0
On

The events you will be returning in the result set fall into at least one of those sets:

  1. All events that ended in query window.
  2. All events that started in query window.

(Events that start and end within the query window fall into both sets)

If you index your from and to attributes, you can query for All events that ended in query window, and union that with All events that started in query window. You can still use the correct index for each set of events. Remember to deduplicate for events that happen to fall into both sets.

r.table("events").between(r.time(2020, 4, 8, "Z"), r.time(2020, 4, 11, "Z"), { index: "from", rightBound: "closed" }).union(
  r.table("events").between(r.time(2020, 4, 8, "Z"), r.time(2020, 4, 11, "Z"), { index: "to" })
).distinct()

I can't speak to wether this is idiomatic or best way of accomplishing this, but I had to use something similar in the past.

2
On

Don't know much about RethinkDB so just a crazy idea: maybe geospatial index can help? Definitely not idiomatic, but there is getIntersecting

Get all documents where the given geometry object intersects the geometry object of the requested geospatial index.

And you can think of events' date intervals and window interval as lines (or thin rectangles).