I've got sequence(s) of points with coordinates and other attributes that I'd like to convert to lines (shapely
LineString
s).
Pandas DataFrame looks a like this:
Path locIdx Arr Dep PostLength Long Lat geometry
0 32613 1 NaT 05:00:00 219.0326572 -1.3473 53.9396 POINT (-1.3473 53.9396)
1 32613 2 05:02:00 05:02:00 181.020583 -1.3433 53.9338 POINT (-1.3433 53.9338)
2 32613 3 05:03:00 05:03:00 440.4625762 -1.3435 53.9322 POINT (-1.3435 53.9322)
3 32613 4 05:05:00 05:05:00 551.3486222 -1.3454 53.9285 POINT (-1.3454 53.9285)
4 32613 5 05:06:00 05:06:00 575.912064 -1.347 53.9272 POINT (-1.347 53.9272)
5 32613 6 05:07:00 NaT nan -1.3519 53.9299 POINT (-1.3519 53.9299)
Conversion to lines would obviously include 1 line less than number of points in sequence, but I'd like to keep point attributes (like PostLength
) and also calculate some additional (like timeDiff
= Arr
- Dep
, based on "next item Arr
" attribute).
The ideas I got include duplication of each record (row) and grouping them with something like described here:
geo_df = geo_df.groupby(['Path', 'locIdx'])['geometry'].apply(lambda x: LineString(x.tolist()))
But this solution doesn't seem ideal (especially when calculating difference of two attributes from different rows) and I'm having a feeling that something better can be done. Maybe I should just iterate DataFrame?
You can use a couple of options. In this solution I make use of
itertool.pairwise
, the.diff()
method.To recreate your data:
And use this to get the desired result: