How to speed up calculating geometry area of 2D boundary arrays?

210 Views Asked by At

Background

I have four 2D arrays: lon_centers, lat_centers, lon_bnds, and lat_bnds. bnds means the boundary of each pixel and centers stands for the center of pixels.

Data overview

import numpy as np
import matplotlib.pyplot as plt

lon_bnds = np.array([[-77.9645  , -77.56074 , -77.162025, -76.76827 , -76.37937 ],
                    [-77.88815 , -77.48613 , -77.08915 , -76.69711 , -76.30993 ],
                    [-77.811676, -77.41139 , -77.01614 , -76.62582 , -76.24034 ],
                    [-77.73638 , -77.337814, -76.944275, -76.55565 , -76.17186 ],
                    [-77.66197 , -77.265114, -76.87326 , -76.48632 , -76.1042  ]])

lat_bnds = np.array([[-77.34674 , -77.35804 , -77.36858 , -77.378395, -77.38752 ],
                    [-77.28847 , -77.299614, -77.31001 , -77.31969 , -77.328674],
                    [-77.23022 , -77.24122 , -77.25147 , -77.26101 , -77.26986 ],
                    [-77.17193 , -77.182785, -77.192894, -77.20229 , -77.211006],
                    [-77.11363 , -77.12434 , -77.13431 , -77.14357 , -77.15215 ]])

lon_centers = np.array([[-77.72404 , -77.323685, -76.92833 , -76.53787 ],
                        [-77.64892 , -77.2503  , -76.85666 , -76.46791 ],
                        [-77.57335 , -77.17646 , -76.78454 , -76.3975  ],
                        [-77.499626, -77.10444 , -76.71421 , -76.32886 ]])

lat_centers = np.array([[-77.32333 , -77.334175, -77.344284, -77.353676],
                        [-77.264946, -77.27564 , -77.28561 , -77.29487 ],
                        [-77.20665 , -77.21719 , -77.22702 , -77.23614 ],
                        [-77.14826 , -77.15867 , -77.16835 , -77.17734 ]])

plt.scatter(lon_bnds, lat_bnds, label='corner')
plt.scatter(lon_centers, lat_centers, marker='x', c='r', label='center')
plt.legend()

example

Goal

The goal is to calculate the area (m2) of each pixel, which means area has the same shape as lon_centers and lat_centers. I found this useful question for calculating the area of one geo-polygon. However, it needs inputs in order and loop of the 2D arrays.

Attempt

from pyproj import Geod

geod = Geod(ellps="WGS84")

areas = []

for x in range(lon_bnds.shape[0]-1):
    for y in range(lon_bnds.shape[1]-1):
        lons = lon_bnds[x:x+2, y:y+2].ravel()
        lats = lat_bnds[x:x+2, y:y+2].ravel()
        lons[-2], lons[-1] = lons[-1], lons[-2]
        lats[-2], lats[-1] = lats[-1], lats[-2]

        poly_area, poly_perimeter = geod.polygon_area_perimeter(lons, lats)
        areas.append(poly_area)

areas = np.asarray(areas).reshape(lon_centers.shape)

The for loop is too slow if there're more pixels. Is it possible to speed it up?

2

There are 2 best solutions below

5
On

Here's something you can consider when you are doing multiple for loops onto arrays.

# Without caching
def function1():
    geod = Geod(ellps="WGS84")

    areas = []

    for x in range(lon_bnds.shape[0]-1):
        for y in range(lon_bnds.shape[1]-1):
            lons = lon_bnds[x:x+2, y:y+2].ravel()
            lats = lat_bnds[x:x+2, y:y+2].ravel()
            lons[-2], lons[-1] = lons[-1], lons[-2]
            lats[-2], lats[-1] = lats[-1], lats[-2]

            poly_area, poly_perimeter = geod.polygon_area_perimeter(lons, lats)
            areas.append(poly_area)
    return areas
# With caching
from functools import lru_cache

@lru_cache(maxsize=512)
def function2():
    geod = Geod(ellps="WGS84")

    areas = []

    for x in range(lon_bnds.shape[0]-1):
        for y in range(lon_bnds.shape[1]-1):
            lons = lon_bnds[x:x+2, y:y+2].ravel()
            lats = lat_bnds[x:x+2, y:y+2].ravel()
            lons[-2], lons[-1] = lons[-1], lons[-2]
            lats[-2], lats[-1] = lats[-1], lats[-2]

            poly_area, poly_perimeter = geod.polygon_area_perimeter(lons, lats)
            areas.append(poly_area)
    return areas
%timeit function1()
229 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit function2()
84.3 ns ± 2.8 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
4
On

It's very convenient that you have the points arranged in a grid so you know how they match up. The general formula for the area of a planar quadrilateral given a list of points arranged in a counter-clockwise direction is:

A = 0.5 * ((x1 * y2 + x2 * y3 + x3 * y4 + x4 * y1) -
           (x2 * y1 + x3 * y2 + x4 * y3 + x1 * y4))

This won't do you much good if x and y are in degrees, but you can convert them to meters on a flat local approximation of the surface for a pretty good estimate. To do that, you need to know the local radius of the Earth, R. You can use the average radius of 6371km, or you can use R = sqrt((R1**2 * cos(lat))**2 + (R2**2 * sin(lat))**2) / ((R1 * cos(lat))**2 + (R2 * sin(lat))**2), with R1 = 6378.137km and R2 = 6356.752. Let's use the mean lat/lon of the data as the reference point:

R1 = 6378.137
R2 = 6356.752

lons = np.deg2rad(lon_bnds)
lats = np.deg2rad(lat_bnds)
R = np.sqrt(((R1**2 * np.cos(lats))**2 + (R2**2 * np.sin(lats))**2) / ((R1 * np.cos(lats))**2 + (R2 * np.sin(lats))**2))
lon_ref = np.mean(lons, axis=None)
lat_ref = np.mean(lats, axis=None)
x = R * (lons - lon_ref) * np.cos(lats)
y = R * (lats - lat_ref)

Here is a plot similar to your original, but converted to meters:

lon_c = np.deg2rad(lon_centers)
lat_c = np.deg2rad(lat_centers)
Rc = np.sqrt(((R1**2 * np.cos(lat_c))**2 + (R2**2 * np.sin(lat_c))**2) / ((R1 * np.cos(lat_c))**2 + (R2 * np.sin(lat_c))**2))
xc = Rc * (lon_c - lon_ref) * np.cos(lat_c)
yc = Rc * (lat_c - lat_ref)
plt.scatter(x, y)
plt.scatter(xc, yc, marker='x', c='r')

enter image description here

A = 0.5 * ((x[1:, :-1] * y[:-1, :-1] + x[1:, 1:] * y[1:, :-1] + x[:-1, 1:] * y[1:, 1:] + x[:-1, :-1] * y[:-1, 1:]) - (x[:-1, :-1] * y[1:, :-1] + x[1:, :-1] * y[1:, 1:] + x[1:, 1:] * y[:-1, 1:] + x[:-1, 1:] * y[:-1, :-1]))

The result is

array([[65.80938398, 64.91795373, 64.04493782, 63.19815103],
       [65.77719064, 64.87959934, 64.01155424, 63.16247495],
       [65.77861439, 64.87956589, 64.01289216, 63.16872493],
       [65.755155  , 64.85808632, 63.98664385, 63.14105705]])

Units are square kilometers.


For larger quadrangles, you may want to project each one individually about its center. In that case, you can do something like this:

def xy(s1, s2):
    x = R[s1, s2] * (lons[s1, s2] - lon_c) * np.cos(lats[s1, s2])
    y = R[s1, s2] * (lats[s1, s2] - lat_c)
    return x, y
x1, y1 = xy(slice(None, -1), slice(None, -1))
x2, y2 = xy(slice(None, -1), slice(1, None))
x3, y3 = xy(slice(1, None), slice(1, None))
x4, y4 = xy(slice(1, None), slice(None, -1))
A = 0.5 * ((x1 * y2 + x2 * y3 + x3 * y4 + x4 * y1) -
           (x2 * y1 + x3 * y2 + x4 * y3 + x1 * y4))

In this case, the result is similar:

array([[65.80950493, 64.9180908 , 64.04508934, 63.19831548],
       [65.77721436, 64.87964039, 64.01161095, 63.16254575],
       [65.77854026, 64.87951032, 64.01285345, 63.16870148],
       [65.75498209, 64.85793324, 63.98650883, 63.14093847]])

The areas computed this way are almost identical:

array([[65.80950493, 64.9180908 , 64.04508934, 63.19831548],
       [65.77721436, 64.87964039, 64.01161095, 63.16254575],
       [65.77854026, 64.87951032, 64.01285345, 63.16870148],
       [65.75498209, 64.85793324, 63.98650883, 63.14093847]])

The differences are just a few tens of square meters:

array([[-1.20941134e-04, -1.37064710e-04, -1.51528127e-04, -1.64456803e-04],
       [-2.37217200e-05, -4.10480434e-05, -5.67132960e-05, -7.08008247e-05],
       [ 7.41315753e-05,  5.55721762e-05,  3.87051454e-05,  2.34522307e-05],
       [ 1.72913145e-04,  1.53083044e-04,  1.35020126e-04,  1.18583183e-04]])