I have a table with data that covers time periods. The periods overlap.
I want to purge the data in order to save space, but without losing days of data.
Here is an example table:
Id | Start | End
1 | 2023.01.01 | 2023.01.05
2 | 2023.01.03 | 2023.01.07
3 | 2023.01.04 | 2023.01.09
4 | 2023.01.07 | 2023.01.13
5 | 2023.01.09 | 2023.01.14
6 | 2023.01.12 | 2023.01.15
In this example, I have several solutions :
1. Clean 2 and 5
2. Clean 3 and 5
The best solution is the second, because 2 and 3 don't have the same weight.
- Weight of Id=2 : 5 days
- Weight of Id=3 : 6 days
So I will gain more space if I took the second solution.
I have millions of rows, so I need an efficient algorithm.
Note that sometimes I have intervals with no data.
Do you have a name of algorithm to solve this problem please ?



If we separate all
Ntime intervals like[1..2],[3..3],[4..5]..[12..13],[14..14],[15..15]for the first picture, we can build a net (kind of directed graph with source and sink), where edges from source to intervals above have capacity 1, edges from intervals to time periods (covering intervals) have capacity 1, edges from time periods to sink have large enough (N) capacity, and cost of the last edges correspond to period weight.So Minimum-cost flow problem providing amount of flow
N