Using Levenshtein distance as a metric, I want to find the exact k-nearest neighbours for all elements in a large set of strings, but I am not yet sure how high a value of k I will require. Is there an algorithm or data structure that will allow me to defer this selection, and gradually increase k without significant efficiency costs over calculating the higher value of k in the first place? I would like the flexibility of being able to use different values of k for different elements, if possible.
I have a number of data-sets I could use, but I'd like to use one with 500000 strings, each approximately 100 characters, which would make methods approaching O(N^2) calls to the distance function take too long.
I have tried using GNAT, but found knn queries a bit too slow (often approaching N distance function calls per element).