From the slides of a course, I found these:
Given a set P in R^D, and a query point q, it's NN is point p_0 in P, where:
dist(p_0, q) <= dist(p, q), for every p in P.
Similarly, with an approximation factor 1 > ε > 0, the ε-NN is p_0, such that:
dist(p_0, q) <= (1+ε) * dist(p, q), for every p in P.
(I wonder why ε can't reach 1).
We build a KD-tree and then we search for the NN, with this algorithm:
which is correct, as far as my mind goes and my testing.
How should I modify the above algorithm, in order to perform Approximate Nearest Neighbour Search (ANNS)?
My thought is to multiply the current best (at the part of the update in the leaf) with ε and leave the rest of the algorithm as is. I am not sure however, if this is correct. Can someone explain?
PS - I understand how search for NN works.
Note that I asked in the Computer Science site, but I got nothing!
The one modification needed is to replace
current best distance
withcurrent best distance/(1+ε)
. This prunes the nodes that cannot contain a point violating the new inequality.The reason that this works is that (assuming that
cut-coor(q)
is on the left side) the testis checking to see if the hyperplane separating
left-child
andright-child
is closer thancurrent best distance
, which is a necessary condition for a point inright-child
to be closer than that to the query pointq
, as the line segment joiningq
and a point inright-child
passes through that hyperplane. By replacingd(p_0, q) = current best distance
withcurrent best distance/(1+ε)
, we're checking to see if any pointp
on the right side could satisfywhich is equivalent to
which is a witness to the violation of the approximate nearest neighbor guarantee.