Convert sequence of points (pandas df) to lines

73 Views Asked by svn At 11 December 2023 at 14:36

I've got sequence(s) of points with coordinates and other attributes that I'd like to convert to lines (shapely LineStrings). Pandas DataFrame looks a like this:

    Path  locIdx  Arr        Dep        PostLength  Long    Lat     geometry
0   32613   1   NaT         05:00:00    219.0326572 -1.3473 53.9396 POINT (-1.3473 53.9396)
1   32613   2   05:02:00    05:02:00    181.020583  -1.3433 53.9338 POINT (-1.3433 53.9338)
2   32613   3   05:03:00    05:03:00    440.4625762 -1.3435 53.9322 POINT (-1.3435 53.9322)
3   32613   4   05:05:00    05:05:00    551.3486222 -1.3454 53.9285 POINT (-1.3454 53.9285)
4   32613   5   05:06:00    05:06:00    575.912064  -1.347  53.9272 POINT (-1.347 53.9272)
5   32613   6   05:07:00    NaT         nan         -1.3519 53.9299 POINT (-1.3519 53.9299)

Conversion to lines would obviously include 1 line less than number of points in sequence, but I'd like to keep point attributes (like PostLength) and also calculate some additional (like timeDiff = Arr - Dep, based on "next item Arr" attribute).

The ideas I got include duplication of each record (row) and grouping them with something like described here:

geo_df = geo_df.groupby(['Path', 'locIdx'])['geometry'].apply(lambda x: LineString(x.tolist()))

But this solution doesn't seem ideal (especially when calculating difference of two attributes from different rows) and I'm having a feeling that something better can be done. Maybe I should just iterate DataFrame?

Original Q&A

There are 1 best solutions below

Victor Savenije On 14 December 2023 at 17:29

You can use a couple of options. In this solution I make use of itertool.pairwise, the .diff() method.

To recreate your data:

data = {
    'Path': [32613, 32613, 32613, 32613, 32613, 32613],
    'locIdx': [1, 2, 3, 4, 5, 6],
    'Arr': ['05:00:00', '05:02:00', '05:03:00', '05:05:00', '05:06:00', '05:07:00'],
    'PostLength': [219.0326572, 181.020583, 440.4625762, 551.3486222, 575.912064, None],
    'Long': [-1.3473, -1.3433, -1.3435, -1.3454, -1.347, -1.3519],
    'Lat': [53.9396, 53.9338, 53.9322, 53.9285, 53.9272, 53.9299],
}
df = pd.DataFrame(data)

geometry = [Point(xy) for xy in zip(df['Long'], df['Lat'])]
geo_df = gpd.GeoDataFrame(df, geometry=geometry)

And use this to get the desired result:

gdf_diff = gpd.GeoDataFrame({"timeDiff ": pd.to_datetime(geo_df.Arr).diff()[1:].reset_index(drop=True),
                             "PostLength": geo_df["PostLength"].iloc[:-1]},
                            geometry=[shapely.LineString(points) for points in itertools.pairwise(geo_df.geometry)])

Convert sequence of points (pandas df) to lines

There are 1 best solutions below

Related Questions in GROUP-BY

Related Questions in GEOPANDAS

Related Questions in SHAPELY

Related Questions in MULTILINESTRING

Trending Questions

Popular # Hahtags

Popular Questions