Finding row with closest numerical proximity within Pandas DataFrame

112 Views Asked by Torockus At 28 January 2020 at 15:21

I have a Pandas DataFrame with the following hypothetical data:

   ID    Time        X-coord      Y-coord
0  1      5          68          5
1  2      8          72          78 
2  3      1          15          23 
3  4      4          81          59 
4  5      9          78          99 
5  6     12          55          12 
6  7      5          85          14 
7  8      7          58          17 
8  9     13          91          47 
9  10    10          29          87

For each row (or ID), I want to find the ID with the closest proximity in time and space (X & Y) within this dataframe. Bonus: Time should have priority over XY. Ideally, in the end I would like to have a new column called "Closest_ID" containing the most proximal ID within the dataframe.

I'm having trouble coming up with a function for this.

I would really appreciate any help or hint that points me in the right direction!

Thanks a lot!

Original Q&A

There are 1 best solutions below

noamgot On 28 January 2020 at 15:40

Let's denote df as our dataframe. Then you can do something like:

from sklearn.metrics import pairwise_distances

space_vals = df[['X-coord', 'Y-coord']]
time_vals =df['Time']
space_distance = pairwise_distance(space_vals)
time_distance = pairwise_distance(time_vals)

space_distance[space_distance == 0] = 1e9 # arbitrary large number
time_distance[time_distance == 0] = 1e9 # again

closest_space_id = np.argmin(space_distance, axis=0)
closest_time_id = np.argmin(time_distance, axis=0)

Then, you can store the last 2 results in 2 columns, or somehow decide which one is closer.

Note: this code hasn't been checked, and it might have a few bugs...

Finding row with closest numerical proximity within Pandas DataFrame

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Trending Questions

Popular # Hahtags

Popular Questions