RAPIDS cuml KNeighbors: number of landmark samples must be >= k

241 Views Asked by At

Minimum reproducible example:

import cudf
from cuml.neighbors import KNeighborsRegressor
d = {
    'id':['a','b','c','d','e','f'],
    'latitude':[50,-22,13,37,43,14],
    'longitude':[3,-43,100,27,-4,121],
}
df = cudf.DataFrame(d)
knn = KNeighborsRegressor(n_neighbors = 4, metric = 'haversine')
knn.fit(df[['latitude','longitude']],df.index)
dists, nears = knn.kneighbors(df[['latitude','longitude']], return_distance = True)

Throws an error number of landmark samples must be >= k the whole trace is:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_33/1073358290.py in <module>
     10 knn = KNeighborsRegressor(n_neighbors = 4, metric = 'haversine')
     11 knn.fit(df[['latitude','longitude']],df.index)
---> 12 dists, nears = knn.kneighbors(df[['latitude','longitude']], return_distance = True)

/opt/conda/lib/python3.7/site-packages/cuml/internals/api_decorators.py in inner_get(*args, **kwargs)
    584 
    585                 # Call the function
--> 586                 ret_val = func(*args, **kwargs)
    587 
    588             return cm.process_return(ret_val)

cuml/neighbors/nearest_neighbors.pyx in cuml.neighbors.nearest_neighbors.NearestNeighbors.kneighbors()

cuml/neighbors/nearest_neighbors.pyx in cuml.neighbors.nearest_neighbors.NearestNeighbors._kneighbors()

cuml/neighbors/nearest_neighbors.pyx in cuml.neighbors.nearest_neighbors.NearestNeighbors._kneighbors_dense()

RuntimeError: exception occured! file=_deps/raft-src/cpp/include/raft/spatial/knn/detail/ball_cover.cuh line=326: number of landmark samples must be >= k
Obtained 64 stack frames
...

I have been trying hard to get around this error for days but the only way i know is to convert the cudf to pandas df and use sklearn. And it works perfectly:

import pandas as pd
from sklearn.neighbors import KNeighborsRegressor
d = {
    'id':['a','b','c','d','e','f'],
    'latitude':[50,-22,13,37,43,14],
    'longitude':[3,-43,100,27,-4,121],
}
df = pd.DataFrame(d)
knn = KNeighborsRegressor(n_neighbors = 4, metric = 'haversine')
knn.fit(df[['latitude','longitude']],df.index)
dists, nears = knn.kneighbors(df[['latitude','longitude']], return_distance = True)
dists

gives us the distances array Can you help me find a pure RAPIDS solution?

UPDATE: I found out that it works for number of neighbors <= length of the total data//2

UPDATE: Its a bug, and an appropriate issue has been opened here. We can pass algorithm='brute' as a work around until the issue gets resolved

0

There are 0 best solutions below