Find 10 closest points in decreasing order

Question

Find 10 closest points in decreasing order

848 Views Asked by Jerry George At 17 August 2025 at 21:11

I am trying to find the distance between a point and other 40,000 points.

Each point is a 300 dimension vector.

I am able to find the closes point. How do I find the 10 nearest points in decreasing order?

Function for closest point:

from scipy.spatial import distance
def closest_node(node,df):
    closest_index = distance.cdist([node],df.feature.tolist()).argmin()
    return pd.Series([df.title.tolist([closest_index],df.id.tolist()[closest_index]])

This command returns the closest title and id:

df3[["closest_title","closest_id"]]=df3.feature.apply(lambda row: closest_node(row,df2))

df2- pandas dataframe of 40,000 points (each 300 dimension)

How do I return the title and index for the 10 closest points

Thanks

Original Q&A

There are 2 best solutions below

ttreis On 04 October 2018 at 15:30

As I don't know your complete code of have a sample of data, here would be my suggestion:

Instead of using ".argmin()" just sort your list by distance and then return the first ten elements of the sorted list. Then find their indices like you're already doing it.

**chase** · Accepted Answer

Just slice the sorted distance matrix for the top 10 nodes. Something like this:

from scipy.spatial import distance

# Find the query node
query_node = df.iloc[10] ## Not sure what you're looking for

# Find the distance between this node and everyone else
euclidean_distances = df.apply(lambda row: distance.euclidean(row, query_node), axis=1)

# Create a new dataframe with distances.
distance_frame = pandas.DataFrame(data={"dist": euclidean_distances, "idx": euclidean_distances.index})
distance_frame.sort("dist", inplace=True)

# nodes
smallest_dist_ixs = distance_frame.iloc[1:10]["idx"]
most_similar_nodes = df.iloc[int(smallest_dist_ixs)]

My assumption based on the word 'title' you have used here, and the choice of 300 dimensional vectors, is that these are word or phrase vectors.
Gensim actually has a manner to get the top N number of similar words based on this idea, which is reasonably fast.

https://tedboy.github.io/nlps/generated/generated/gensim.models.Word2Vec.most_similar.html

>>> trained_model.most_similar(positive=['woman', 'king'], negative=['man'])
[('queen', 0.50882536), ...]

For something slightly different, this is also slightly similar to the traveling salesman problem (TSP) if you want to get shortest paths between all points, and then simply slice out the first 10 'cities'.

Google has a pretty simple and quick python implementation with OR-Tools here: https://developers.google.com/optimization/routing/tsp.

Find 10 closest points in decreasing order

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in SCIPY

Related Questions in DISTANCE

Related Questions in NEAREST-NEIGHBOR

Related Questions in SCIPY-SPATIAL

Trending Questions

Popular # Hahtags

Popular Questions