Modify this algorithm for Nearest Neighbour Search (NNS) to perform Approximate-NNS

550 Views Asked by At

From the slides of a course, I found these:

Given a set P in R^D, and a query point q, it's NN is point p_0 in P, where:

dist(p_0, q) <= dist(p, q), for every p in P.

Similarly, with an approximation factor 1 > ε > 0, the ε-NN is p_0, such that:

dist(p_0, q) <= (1+ε) * dist(p, q), for every p in P.

(I wonder why ε can't reach 1).

We build a KD-tree and then we search for the NN, with this algorithm: enter image description here which is correct, as far as my mind goes and my testing.

How should I modify the above algorithm, in order to perform Approximate Nearest Neighbour Search (ANNS)?

My thought is to multiply the current best (at the part of the update in the leaf) with ε and leave the rest of the algorithm as is. I am not sure however, if this is correct. Can someone explain?

PS - I understand how search for NN works.

Note that I asked in the Computer Science site, but I got nothing!

1

There are 1 best solutions below

10
On BEST ANSWER

The one modification needed is to replace current best distance with current best distance/(1+ε). This prunes the nodes that cannot contain a point violating the new inequality.

The reason that this works is that (assuming that cut-coor(q) is on the left side) the test

cut-coor(q) + current best distance > node's cut-value

is checking to see if the hyperplane separating left-child and right-child is closer than current best distance, which is a necessary condition for a point in right-child to be closer than that to the query point q, as the line segment joining q and a point in right-child passes through that hyperplane. By replacing d(p_0, q) = current best distance with current best distance/(1+ε), we're checking to see if any point p on the right side could satisfy

d(p, q) < d(p_0, q)/(1+ε),

which is equivalent to

(1+ε) d(p, q) < d(p_0, q),

which is a witness to the violation of the approximate nearest neighbor guarantee.