Efficient kNN graph construction with deferred selection of k

84 Views Asked by At

Using Levenshtein distance as a metric, I want to find the exact k-nearest neighbours for all elements in a large set of strings, but I am not yet sure how high a value of k I will require. Is there an algorithm or data structure that will allow me to defer this selection, and gradually increase k without significant efficiency costs over calculating the higher value of k in the first place? I would like the flexibility of being able to use different values of k for different elements, if possible.

I have a number of data-sets I could use, but I'd like to use one with 500000 strings, each approximately 100 characters, which would make methods approaching O(N^2) calls to the distance function take too long.

I have tried using GNAT, but found knn queries a bit too slow (often approaching N distance function calls per element).

0

There are 0 best solutions below