Populating Effectivity Satellites and handling out of order change events

628 Views Asked by At

In https://www.linkedin.com/pulse/data-lt-mysteries-effectivity-satellite-driving-key-patrick-cuba/?trk=read_related_article-card_title Patrick Cuba showed how the insert-only logic work when populating an effectivity satellite.

However when applying this to an event stream where we have out of order events, the ingestion and querying logic seems to explode in complexity.

Example

Consider this series of change events depicting change in a person's employment relationship.

person_id employee_id src_event_time change_type
A X t1 INSERT
A X t2 DELETE
A X t3 INSERT

Consider the following scenario where the event come in the order of t3, t1, t2 and we're ingesting it into the effectivity link satellite table:

1. Processing event t3

person_id employee_id effectivity_start_date effectivity_end_date
A X t3 t9999

2. Processing event t1

Here in our ingestion logic we can identify that the new event time t1 is before the existing event t3. So we set the effectivity to between t1 and t3

person_id employee_id effectivity_start_date effectivity_end_date
A X t3 t9999
A X t1 t3

3. Processing event t2

This is where things get a bit weird. Our logic will have to detect that there are existing effectivity period (from t1 to t3) and then "correct" that by adding a (t1 - t2) effectivity period. Since our constraint is that we only ever INSERT and not update existing rows, we now have two effectivity rows with the same start time t1.

person_id employee_id effectivity_start_date effectivity_end_date
A X t3 t9999
A X t1 t3
A X t1 t2

Question

Is this the correct approach? How would I go about querying the state of a relationship at a given point in time? (For example, how would I query the relationship at t1.5, or t2.5?)

1

There are 1 best solutions below

2
On

It’s not quite correct Data Vault Guru contains better examples, load code and query code