Optimize scipy nearest neighbor search

651 Views Asked by ds_user At 20 October 2024 at 21:42

I am trying to find all the nearest neighbors which are within 1 KM radius. Here is my script to construct tree and search the nearest points,

from pysal.cg.kdtree import KDTree

def construct_tree(s):
    data_geopoints = [tuple(x) for x in s[['longitude','latitude']].to_records(index=False)]
    tree = KDTree(data_geopoints, distance_metric='Arc', radius=pysal.cg.RADIUS_EARTH_KM)
    return tree

def get_neighbors(s,tree):
    indices = tree.query_ball_point(s, 1)
    return indices

#Constructing the tree for search
tree = construct_tree(data)

#Finding the nearest neighbours within 1KM
data['neighborhood'] = data['lat_long'].apply(lambda row: get_neighbors(row,tree))

From what I read in pysal page, it says -

kd-tree built on top of kd-tree functionality in scipy. If using scipy 0.12 or greater uses the scipy.spatial.cKDTree, otherwise uses scipy.spatial.KDTree.

In my case it should be using cKDTree. This is working fine for a sample dataset, but since the tree.query_ball_point returns the list of indices as a result. Each list will have 100s of elements. For my data points (2 Million records), this is growing bigger and bigger and stops due to memory issue after certain point. Any idea on how to solve this?

Original Q&A

There are 1 best solutions below

ds_user On 13 August 2017 at 22:39 BEST ANSWER

Just in case if anyone looking for an answer for this, I have solved it by finding the nearest neighbours for a group (tree.query_ball_point can handle batches) and write in to database and then process next group, rather than keeping all in memory. Thanks.

Optimize scipy nearest neighbor search

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-2.7

Related Questions in SCIPY

Related Questions in NEAREST-NEIGHBOR

Related Questions in PYSAL

Trending Questions

Popular # Hahtags

Popular Questions