Excluding data points based on proximity in a scatterplot

114 Views Asked by At

I am trying to create a representation of Amsterdam's channels based on a very large data set of coordinates send through AIS. As the AIS is sometimes calibrated wrong, some coordinates are not on the actual channel, but rather on urban structures. Luckily, this happens relatively few times. As a result these datapoints are not in close proximity of other data points / data point clusters. As such, I want to exlude these data points which are do not have a 'neighbour' with a margin (say 5 meters in real life) in the most pythonic way. Would anyone know how to approach this problem? My data is a simple pandas dataframe:

              lng        lat
0        4.962218  52.362260
1        4.882198  52.406013
2        4.918583  52.335535
3        4.908185  52.381353
4        5.020983  52.277188
...           ...        ...
2249835  4.979960  52.352660
2249836  4.914533  52.334980
2249837  4.856630  52.401977
2249838  4.971418  52.357525
2249839  5.042353  52.402142

[2211095 rows x 2 columns]

and the map currently looks as follows, I have marked examples of coordinates I want filter out / exclude:

Coordinates (examples) in need of excluding highlighted in yellow

0

There are 0 best solutions below