To effectively find n nearest neighbors of a point in d-dimensional space, I selected the dimension with greatest scatter (i.e. in this coordinate differences between points are largest). The whole range from minimal to maximal value in this dimension was split into k bins. Each bin contains points which coordinates (in this dimensions) are within the range of that bin. It was ensured that there are at least 2n points in each bin. The algorithm for finding n nearest neighbors of point x is following:
- Identify bin kx,in which point x lies(its projection to be precise).
- Compute distances between x and all the points in bin kx.
- Sort computed distances in ascending order.
- Select first n distances. Points to which these distances were measured are returned as n nearest neighbors of x.
This algorithm is not working for all cases. When algorithm can fail to compute nearest neighbors? Can anyone propose modification of the algorithm to ensure proper operation for all cases?
This is failing because (any of) the k nearest neighbors of x could be in a different bin than x.