Given a data set of a few millions of price ranges, we need to find the smallest range that contains a given price.
The following rules apply:
- Ranges can be fully nested (ie, 1-10 and 5-10 is valid)
- Ranges cannot be partially nested (ie, 1-10 and 5-15 is invalid)
Example:
Given the following price ranges:
- 1-100
- 50-100
- 100-120
- 5-10
- 5-20
The result for searching price 7 should be 5-10
The result for searching price 100 should be 100-120 (smallest range containing 100).
What's the most efficient algorithm/data structure to implement this?
Searching the web, I only found solutions for searching ranges within ranges.
I've been looking at Morton count and Hilbert curve, but can't wrap my head around how to use them for this case.
Thanks.
Because you did not mention this ad hoc algorithm, I'll propose this as a simple answer to your question:
This is a python function, but it's fairly easy to understand and convert it in another language.
I believe there are more efficient and complicated solutions (if you can do some precomputation for example) but this is the first step you must take.
EDIT : Here is a better solution, probably in O(log(n)) but it's not trivial. It is a tree where each node is an interval, and has a child list of all strictly non overlapping intervals that are contained inside him. Preprocessing is done in O(n log(n)) time and queries are O(n) in worst case (when you can't find 2 ranges that don't overlap) and probably O(log(n)) in average.
2 classes: Tree that holds the tree and can query:
Node that holds the structure and information:
and to test that: