The only reasonable slide set I found is this, which in page 15 says, for building:
- Sort all points by their x coordinate value and store them in the leaf nodes of a balanced binary tree (i.e., a range tree)
- Starting at the root, each node contains the point in its subtree with the maximum value for its y coordinate that has not been stored
at a shallower depth in the tree; if no such node exists, then node
is empty
I implemented a Range Tree just before, so based on the example I used there, I now have (for a Priority Search Tree):
7
------ 0 ---------------------
| |
6 6
---- -3 ---- ---- 4 ------
| | | |
4 2 1 3
---- -4 - -2 --- 3 --- 5
| | / \ | | / \
0 (-3,4) (-2,2)(0,7) NIL (4,-4) (5,3)(6,-1)
-5 2
/ \ / \
(-5,6) (-4,0) (2,1) (3,6)
Here, every inner node is of the form:
mid_x
max y
and the range query I am posing is (-inf, 1) x (-0.5, 5), i.e. (x_low, x_high) x (y_low, y_high).
- So, let's start from the root, I check if x (=0) is in (-inf, 1) and if 7 > (-0.5, 5). It is, thus I proceed in both children, but for simplicity, let me follow the left one (in all cases from now).
- I check if -3 is the x range and if 6 is more or equal than the upper bound of the y range of the query (i.e. 5). It is, so I continue.
- Same for the next level, thus we go to the next level and now please focus on this inner node. It has as max y a 0, which indicates that the max y of the subtree is 0, which is incorrect (left child is (-5, 6))*.
What am I missing in building/searching process?
In other words:
Check the leftmost branch of the tree; it has as max_y values (2nd bullet in the quote), 7, 6, 4, 0.
But isn't that value the one that indicated the maximum y value stored in the subtree under the inner node? If so, this does not hold for 0 and point (-5, 6) (6 is not used, since its used in the 2nd level).
*The particular query I am posing might not be damaged by that, but another one can.
You last logic is actually still correct. The (-5,6) value should've already been picked up when you hit the node you labelled (6,-3). Now, I'm no math major. But the general idea is this. In the Priority Search tree as you implemented, you're actually searching on two separate criteria. For x, it's a simple binary search for the range. For y, you're actually searching for it as a priority tree (good for search of y:inf or -inf:y, depending on your whether you use max or min.)
Note that at the bottom of page 15, it states that the tree is good for a semi-infinite range (one is infinite). Keep reading down, you'll see how the tree is optimized for semi-infinite range for y.
In short, since your tree is constructed with x as the binary search and y as a priority (using max remaining value), the optimal search is [x1:x2],[y1:inf].
Each node in the tree essentially stores 2 things. 1. The mid-point of x (or the max-x in the left tree, and the decision to traverse left or right is based on the >x check). 2. The POINT with the highest y value in the subtree that had not been added to previous one.
The search algorithm essentially goes like this. Starting from the root using a criteria of [x1:x2], [y1:inf] 1. Check the y-value of the POINT assigned to the node. If y > y1, go to 2, otherwise STOP traversing down (since the POINT assigned to the node has the highest y value, if the check failed, there's no other node beneath it that can fulfill [y1:inf]. 2. Check if the x-value of the point is in range of [x1:x2]. If so, include that point in the output. Continue to step 3 regardless if you included that point. 3. Check the node's "mid-point" value (let's call that mx). If mx is in range of [x1:x2], traverse down both path. If mx is < [x1:x2], traverse left. Otherwise traverse right. Go back to step 1 for each path you traverse.
EDIT for very, VERY long text: Let's run through your example. I've made an additional annotation marking each point using letter (the point's "name"). Each node now have the format of name of the point, with it's y-value in the parenthsis, and the "mid-range" x below it. For the leaf nodes, those with an '*' means they are already assigned to somewhere up the tree, and should be treated as NIL if you ever reach them.
Let's run an example query of [-2:4] [1:inf](or any y >= 1)
The result is "E,F,G" are in range.