This problem arises from creating closest pair of nodes from two sets/surfaces in a FEM mesh. Here the basic logic of what I am trying to do:
I have two surfaces/sets of nodes which are coplanar (can be treated as 2D even if it's a 3D space). From each node from the first set, I'd like to find the closest node in the second set.
Example:
Set 1: (0,0), (1,0), (2,0), (3,0) (4 points)
Set 2: (1.1,0), (0.1,0), (3.1,0), (2.1,0), (2.2,0) (5 points to query)
result vector -> (2,1,4,3,3) (index starts at 1)
I have been looking at many available open-source algorithms (similar to this benchmark: https://github.com/erikbern/ann-benchmarks), both exact and approximate ones. However, before trying them one by one, I was wondering if any of you had any experience/advice for the algorithm I am looking for. I need only the closest node and the number of nodes can vary anywhere to 100k nodes in a 2D space. Speed and efficiency are the major factor.
Thank you in advance!
I have been looking at many algorithms and decided to try a few; however, I'd like to hear the opinion of others as they might recommend some algorithms compared to others. As of right now, I was going to use this ANN (http://www.cs.umd.edu/~mount/ANN/) which seems much simpler and lighter than other massive open source projects.
As I understand, the ann-benchmarks are for high dimensionality (i.e. 32 dimensions or more). However you are looking for a 2D algorithm.
If you have access to a GPU, the fastest way is probably a GPU based approach, but that comes with some complexity in terms of requiring a larger project to depend on (which you want to avoid I think).
Therefore I would go with one of the traditional spatial indexes: R-tree, quadtree or kd-tree. LSH or a grid index may also be an option.
Now there are several points to consider:
As for LSH, I have very little experience with that you may have to look it up yourself.
As for gridindexes, I am not aware of any open-source implementation, but they a easy to implement yourself (basically you partition your data into a grid (e.g. 10m x 10m). Every grid cell simply contains a list or set of you points. For nearest neighbor search you then have to brute-force-search all points in a given cell and its neighbors (if the closest neighbor is closer than the closest point inside a cell).
I compared several Java implementations here: PDF, 1-NN-queries on 2D point data are in figure 28a, update rates are in figure 40a), index-build performance is in figure 2a). As reference I used an openstreetmap dataset representing the Alps in Europe.