Finding row with closest numerical proximity within Pandas DataFrame

110 Views Asked by At

I have a Pandas DataFrame with the following hypothetical data:

   ID    Time        X-coord      Y-coord
0  1      5          68          5
1  2      8          72          78 
2  3      1          15          23 
3  4      4          81          59 
4  5      9          78          99 
5  6     12          55          12 
6  7      5          85          14 
7  8      7          58          17 
8  9     13          91          47 
9  10    10          29          87 

For each row (or ID), I want to find the ID with the closest proximity in time and space (X & Y) within this dataframe. Bonus: Time should have priority over XY. Ideally, in the end I would like to have a new column called "Closest_ID" containing the most proximal ID within the dataframe.

I'm having trouble coming up with a function for this.

I would really appreciate any help or hint that points me in the right direction!

Thanks a lot!

1

There are 1 best solutions below

1
On

Let's denote df as our dataframe. Then you can do something like:

from sklearn.metrics import pairwise_distances

space_vals = df[['X-coord', 'Y-coord']]
time_vals =df['Time']
space_distance = pairwise_distance(space_vals)
time_distance = pairwise_distance(time_vals)

space_distance[space_distance == 0] = 1e9 # arbitrary large number
time_distance[time_distance == 0] = 1e9 # again

closest_space_id = np.argmin(space_distance, axis=0)
closest_time_id = np.argmin(time_distance, axis=0)

Then, you can store the last 2 results in 2 columns, or somehow decide which one is closer.

Note: this code hasn't been checked, and it might have a few bugs...