I am looking to figure out if I can solve the following problem in SQL, or if I'm better off selecting the values into my scripting language and just bulk update from there.
There are some points in time, and there are some time intervals defined by the center of the time interval and a maximum duration from the center, let it be 10 minutes for all of them. Centers may be at any duration from each other, points may be at any duration from each other. Looking to select all time intervals, together with one or zero points, so that each point is either not assigned or assigned to only one interval. If one point matches more than one interval, or vice versa, points shall be chosen so that the total duration between points and interval centers is minimized.
Sample data
interval
id centertime
1 2001-01-01 12.00 # starts at 11.50 ends at 12.10
2 2001-01-01 12.15 # starts at 12.05 ends at 12.25
3 2001-01-01 12.20 # starts at 12.10 ends at 12.30
point
id time
21 2001-01-01 12.00
22 2001-01-01 12.11
23 2001-01-01 12.17
24 2001-01-01 12.19
Desired results:
interval_id point_id
1 21
2 23
3 24
Explanation
Point 21 exactly matches center of interval 1, and nothing else, so is assigned.
Point 23 is closer to interval 2 than 3, but point 24 is even closer to 3, so interval 3 is assigned point 24.
Point 22 is the closest remaining point to interval 2, so is assigned.
Point 21 is within interval 2, but point 22 is available and closer, so 21 is not assigned to an interval and does not appear in results.
point 23 is even closer to 3, so 22 is the closest remaining one
Okay I got it.
It uses a lateral join to calculate the duration from each centertime to each point, and wraps that in an additional SELECT to get only the closest match using an ordered window function.