I am trying to create a representation of Amsterdam's channels based on a very large data set of coordinates send through AIS. As the AIS is sometimes calibrated wrong, some coordinates are not on the actual channel, but rather on urban structures. Luckily, this happens relatively few times. As a result these datapoints are not in close proximity of other data points / data point clusters. As such, I want to exlude these data points which are do not have a 'neighbour' with a margin (say 5 meters in real life) in the most pythonic way. Would anyone know how to approach this problem? My data is a simple pandas dataframe:
lng lat
0 4.962218 52.362260
1 4.882198 52.406013
2 4.918583 52.335535
3 4.908185 52.381353
4 5.020983 52.277188
... ... ...
2249835 4.979960 52.352660
2249836 4.914533 52.334980
2249837 4.856630 52.401977
2249838 4.971418 52.357525
2249839 5.042353 52.402142
[2211095 rows x 2 columns]
and the map currently looks as follows, I have marked examples of coordinates I want filter out / exclude:
