expanding multipolygon in geopandas dataframe

2.2k Views Asked by At

I have a shapefile which contains both polygons and multipolygons as following:

   name                                           geometry
0  AB10  POLYGON ((-2.116454759005259 57.14656265903432...
1  AB11  (POLYGON ((-2.052573095588467 57.1342600856536...
2  AB12  (POLYGON ((-2.128066321470298 57.0368357386797...
3  AB13  POLYGON ((-2.261525922489881 57.10693578217748...
4  AB14  POLYGON ((-2.261525922489879 57.10693578217748...

The 2nd and 3rd row correspond to Multipolygon while the rest are polygons. I would like to expand the rows whose geometry is Multipolygon type into rows of Polygon as following.

   name                                           geometry
0  AB10  POLYGON ((-2.116454759005259 57.14656265903432...
1  AB11  POLYGON ((-2.052573095588467 57.1342600856536...
2  AB11  POLYGON ((-2.045849648028651 57.13076387483844...
3  AB12  POLYGON ((-2.128066321470298 57.0368357386797...
4  AB12  POLYGON ((-2.096125852304303 57.14808092585477
3  AB13  POLYGON ((-2.261525922489881 57.10693578217748...
4  AB14  POLYGON ((-2.261525922489879 57.10693578217748...

Note that the AB11 and AB12 Multipolygon have been expanded to multiple rows where each row corresponds to one polygon data.

I think this is geopanda data manipulation. Is there a pythonic way to achieve the above?

Thank you!

2

There are 2 best solutions below

0
On

My current solution to the above is in two-folds.

step 1. go through each row and if the type is multipolygon, then apply list comprehension.

   name                                           geometry
0  AB10  POLYGON ((-2.116454759005259 57.14656265903432...
1  AB11  [POLYGON ((-2.052573095588467 57.1342600856536...
2  AB12  [POLYGON ((-2.128066321470298 57.0368357386797...
3  AB13  POLYGON ((-2.261525922489881 57.10693578217748...
4  AB14  POLYGON ((-2.261525922489879 57.10693578217748...

step 2: Use the trick of expanding list of elements in a row to multiple rows.

df.set_index(['name'])['geometry'].apply(pd.Series).stack().reset_index()

  name  level_1                                                  0
0  AB10        0  POLYGON ((-2.116454759005259 57.14656265903432...
1  AB11        0  POLYGON ((-2.052573095588467 57.13426008565365...
2  AB11        1  POLYGON ((-2.045849648028651 57.13076387483844...
3  AB12        0  POLYGON ((-2.128066321470298 57.0368357386797,...
4  AB12        1  POLYGON ((-2.096125852304303 57.14808092585477...
5  AB13        0  POLYGON ((-2.261525922489881 57.10693578217748...
6  AB14        0  POLYGON ((-2.261525922489879 57.10693578217748...

Please let me know if there is a way to do this in one step!

0
On

We can use numpy for more speed if you have only two columns.

If you have a dataframe like

    name                geometry
0     0               polygn(x)
1     2  (polygn(x), polygn(x))
2     3               polygn(x)
3     4  (polygn(x), polygn(x))

Then numpy meshgrid will help

def cartesian(x): 
    return np.vstack(np.array([np.array(np.meshgrid(*i)).T.reshape(-1,2) for i in x.values]))

ndf = pd.DataFrame(cartesian(df),columns=df.columns)

Output:

  name   geometry
0    0  polygn(x)
1    2  polygn(x)
2    2  polygn(x)
3    3  polygn(x)
4    4  polygn(x)
5    4  polygn(x)
%%timeit
ndf = pd.DataFrame(cartesian(df),columns=df.columns)

1000 loops, best of 3: 679 µs per loop

%%timeit
df.set_index(['name'])['geometry'].apply(pd.Series).stack().reset_index()

100 loops, best of 3: 5.44 ms per loop