There seems to be a similar question here, but I was satisfied with neither the clarity of the answer, nor its practicality. I was asked in a recent interview what data structure I would store my large set of floating point numbers so that I look up a new arrival for itself or its closest neighbor. I said I would use a binary search tree, and try to make it balanced to achieve O(log n).
Then the question was extended to two dimensions: What data structure would I use to store a large set of (x,y) pairs such as geographical coordinates for fast look up? I couldn't think of a satisfactory answer and totally gave up when extended to K-dimensions. Using a k-dimensional tree directly that uses coordinates values to "split" the space doesn't seem to work since two close points near the origin but inside different quadrants may end up on far far away leaves.
After the interview, I remembered Voronoi diagrams for partitioning K-dimensional space nicely. What is the best way of implementing this using what data structure? How are the look ups going to be performed? I feel like this question is so common in computer science that by now it has even a dedicated data structure.
You can use a grid and sort the points into the grid-cells. There is a similar question here:Distance Calculation for massive number of devices/nodes. You can also use a space filling curve (quadkey) or a quadtree,e.g. r-tree when you need additional informations,e.g hierarchy.