Fastest approach to checking whether a points sits within an n-dimensional simplex

297 Views Asked by At

I have some very large datasets and part of the analysis pipeline is to determine, as the title says, whether each point is bound by a simplex in n dimensions. I'm trying to find a way to calculate this fast without parallelisation if possible. One of the hurdles is that the dimensionality of the datasets varies, so the solution needs to be apply to any dimensions, rather than being fixed at 2D or 3D, for example.

However, for simplicity's sake, I have used 2D examples as they are easy to represent, but in theory, the maths should hold.

Barycentric Coordinates

My initial thought was to use barycentric coordinates, converting from cartesians, as is done here but my implementation of this method proves to be untrustworthy to say the least:

import numpy as np
import matplotlib.pyplot as plt

def is_in_simplex(point, T_inv, simplex):
    first_n = np.matmul(
        T_inv, (point - simplex[-1])
    )
    last = 1 - np.sum(first_n)
    bary = np.concatenate((first_n, [last]))
    return np.all((bary <= 1) & (bary >= 0))

# setup

simplex = np.array([[0, 0,], [8, 8,], [10, 3]])
rng = np.random.default_rng()
test_points = rng.random((10, 2))*10

# Maths starts here

T = np.array(simplex[:-1] - simplex[-1]).T
T_inv = np.linalg.inv(T)
within = np.vectorize(is_in_simplex, excluded={1, 2})(test_points, T_inv, simplex)

# drawing

polygon = np.concatenate([simplex, np.array([simplex[0]])])
print()
plt.plot(*polygon.T)
plt.scatter(*test_points.T)
for i, p in enumerate(test_points, 0):
    print(f"{i}\t{p}\t{test_points[i]}\t{within[i]}")
    plt.annotate(i, p)

And the output of this is:

0   [4.15391239 4.85852344] [4.15391239 4.85852344] [ True  True]
1   [5.24829898 9.22879891] [5.24829898 9.22879891] [ True False]
2   [3.31255765 0.75891285] [3.31255765 0.75891285] [ True  True]
3   [3.67468612 1.30045647] [3.67468612 1.30045647] [ True  True]
4   [9.95049042 5.932782  ] [9.95049042 5.932782  ] [False  True]
5   [8.42621723 6.35824573] [8.42621723 6.35824573] [False  True]
6   [4.19569122 3.41275362] [4.19569122 3.41275362] [ True  True]
7   [1.57324033 8.00273677] [1.57324033 8.00273677] [False False]
8   [1.9183791  0.54945207] [1.9183791  0.54945207] [ True  True]
9   [0.52448473 7.77920839] [0.52448473 7.77920839] [False  True]

Barycentric 2D simplex plot

The first column is the index, the second are the cartesian coordinates, the third should be first two barycentric coordinates (should assume they add to 1) and the fourth column should show whether the point lies within the simplex or not.

As you may have noticed, there are a few things wrong. Points 3, 5 and 6 should be labelled as within the simplex, but their barycentric coordinates are completely wrong. As they are bound by the simplex, the barycentric coordinates should be greater than 0 but sum to 1. And the the output of is_in_simplex() is an array, whereas it should be a single boolean for each point.

Not including the RNG, printing and plotting, this takes 0.0383 seconds for ten points, 0.0487 for 100, 0.0994 for 1,000 and 0.523 for 10,000.

Linear Programming

Another approach would be to use a linear programming, but something is off as my timings are far greater than those reported here (second answer, which I used as a starting point for this).

import numpy as np
from scipy.optimize import linprog
import time

def vectorizable(point, simplexT, coeffs):
    b = np.r_[point, np.ones(1)]
    lp = linprog(coeffs, A_eq = simplexT, b_eq = b)
    return lp.success

dims = 2
rng = np.random.default_rng()
test_points = rng.random((10, dims))*10
simplex = np.array([[0, 0,], [8, 8,], [10, 3]])
coeffs = np.zeros(len(simplex))
simplex_T = np.r_[simplex.T,np.ones((1,len(simplex)))]

start_time = time.time()
in_simplex = np.vectorize(vectorizable,
                        excluded={1, 2},
                        signature="(n) -> ()")(test_points, simplex_T, coeffs)

print(f"----- {time.time() - start_time} seconds -----")

polygon = np.concatenate([simplex, np.array([simplex[0]])])
print()
plt.plot(*polygon.T)
plt.scatter(*test_points.T)
for i, p in enumerate(test_points, 0):
    print(f"{i}\t{p}\t{in_simplex[i]}")
    plt.annotate(i, p)

This time, I get the desired result:

----- 0.019016504287719727 seconds -----

0   [5.90479358 5.75174668] True
1   [0.51156474 0.86088186] False
2   [9.22371526 4.025967  ] True
3   [9.35307399 5.38630723] False
4   [2.83575442 5.66318545] False
5   [7.89786072 6.06068206] True
6   [0.09838826 1.38358132] False
7   [3.19776368 9.73562359] False
8   [9.9122709  0.76862067] False
9   [4.52352281 6.2259428 ] False

Lin Prog 2D plot

For 10, 100 and 1,000 points, the timings are more or less in the same order of magnitude. However, when I jump to 10,000 points, I'm suddenly looking at anywhere between 4 and 8 seconds, which is far too slow, and only increases into tens of seconds and minutes when I increase dimensionality.

As mentioned, I would like to avoid parallelisation where possible. Any help/advice regarding the barycentric part would be greatly appreciated, particularly if, if it could work, would be faster than the linear programming approach. And is there any way to accelerate the LP method?

Thanks

3

There are 3 best solutions below

0
Reinderien On

Linear-algebraic approach (I don't think LP is called-for here). Do a hyperplanar half-space test with one matrix multiplication and then some max() and sign() post-processing.

You can get more clever by performing a rectilinear trim before any half-space tests, and partitioning the matrix multiplication and halting when any one half-space test fails. It helps the most when some significant portion of the points test outside of the simplex. In the most extreme case, if no points are contained in the simplex (try radius=1.1), the non-partitioned algorithm takes ~0.6 seconds and with 50 partitions takes ~0.01 seconds.

from time import monotonic

import numpy as np
from numpy.random import default_rng
from scipy.spatial import ConvexHull


def make_homogeneous(test_points: np.ndarray) -> np.ndarray:
    """
    Pre-process an array of (p, ndim) test points into a homogeneous
    transformation matrix of size (ndim+1, p). This only needs to be
    done once for a given point cloud.
    """
    return np.vstack((
        test_points.T,
        np.ones(shape=test_points.shape[0], dtype=test_points.dtype),
    ))


def test_hull(
    hull: ConvexHull, test_homogeneous: np.ndarray,
    n_partitions: int = 1, trim: bool = True,
) -> np.ndarray:
    """
    Vectorized test of whether each test point falls within the simplex.

    :param hull: Hull defining the simplex. Number of dimensions (i.e. hull.equations.shape[1]-1)
                 must be equal to number of dimensions in the test point cloud.
    :param test_homogeneous:
                 Test point cloud, in homogeneous transformation matrix format (from
                 make_homoegeneous()).
    :param n_partitions:
                 Number of inner product partitions. If the number of points falling inside of the
                 simplex is high, partitioning will not help and this should be left as 1 (non-
                 partitioned). If the number of points falling inside of the simplex is low, set
                 this on the order of ~ 1% of the number of hull faces.
    :param trim: Whether to perform a rectilinear trim before any dot products.
    :return: A boolean array with length as the number of test points: true for "inside", false for
             "outside". Values exactly on the simplex boundary are treated as "true" (inside the
             simplex) due to `<= 0` below.
    """

    # m: number of hull faces (to be partitioned)
    # n: 1 + ndim
    # p: number of test points
    m, n = hull.equations.shape
    n, p = test_homogeneous.shape
    partition_size = m // n_partitions

    if trim:
        extents0 = hull.points.min(axis=0)[:, np.newaxis]
        extents1 = hull.points.max(axis=0)[:, np.newaxis]
        inside = (
            (test_homogeneous[:3, :] >= extents0).all(axis=0) &
            (test_homogeneous[:3, :] <= extents1).all(axis=0)
        )
        test_subset = test_homogeneous[:, inside]
        # print(f'Trimmed to {np.count_nonzero(inside)}/{p} points')
    else:
        inside = np.ones(shape=p, dtype=bool)
        test_subset = test_homogeneous

    for i in range(0, m, partition_size):
        hull_subset = hull.equations[i: i + partition_size, :]
        product = hull_subset @ test_subset

        inside_subset = product.max(axis=0) <= 0
        # inside_subset = (product < 0).all(axis=0)  # Equivalent, marginally slower?

        inside[inside] = inside_subset
        if not inside_subset.any():
            break

        test_subset = test_subset[:, inside_subset]

    return inside


def cube_test() -> None:
    """
    Unit test for a cube-shaped hull (2 hull facets per cube side, for 12 facets total)
    """
    hull = ConvexHull([
        [-1, -1, -1],
        [ 1, -1, -1],
        [-1,  1, -1],
        [ 1,  1, -1],
        [-1, -1,  1],
        [ 1, -1,  1],
        [-1,  1,  1],
        [ 1,  1,  1],
    ])
    in_points = np.array([[ 0. ,  0. ,  0. ],
                          [ 0.9,  0. ,  0.2],
                          [-0.5, -0.2, -0.3],
                          [-0.9,  0.4,  0.6],
                          [ 0.1,  0.1,  0.1]])
    bound_points = np.array([[ 1. ,  1. ,  1. ],
                             [-1. , -1. , -1. ],
                             [ 0.5,  0. ,  1. ],
                             [ 1. ,  1. ,  0. ],
                             [ 0. ,  0. ,  1. ]])
    out_points = np.array([[ 2. ,  0. ,  0. ],
                           [ 1. ,  1. ,  1.5],
                           [ 0. ,  0. , -2. ],
                           [ 0. ,  1.1,  0. ],
                           [-1.1,  0. ,  1.2]])
    assert np.all(test_hull(hull, make_homogeneous(in_points)))
    assert np.all(test_hull(hull, make_homogeneous(bound_points)))
    assert not np.any(test_hull(hull, make_homogeneous(out_points)))

    assert np.all(test_hull(hull, make_homogeneous(in_points), n_partitions=4))
    assert np.all(test_hull(hull, make_homogeneous(bound_points), n_partitions=4))
    assert not np.any(test_hull(hull, make_homogeneous(out_points), n_partitions=4))


def random_hemisphere(
    rand, n: int, radius: float = 1,
    centre: tuple[float, float, float] = (0, 0, 0),
    theta_limit=np.pi/2,
) -> np.ndarray:
    """
    Generate a 3D hemisphere with randomly-distributed vertices. The "cut" face of the hemisphere is
    not guaranteed to be exact when processed as a convex hull.
    """
    phi = rand.uniform(low=0, high=2*np.pi, size=n)
    theta = rand.uniform(low=0, high=theta_limit, size=n)
    cx = np.sin(theta)*np.cos(phi)
    cy = np.sin(theta)*np.sin(phi)
    cz = np.cos(theta)
    return radius*np.stack((cx, cy, cz), axis=1) + centre


def hemisphere_test() -> None:
    """
    Unit test for a hemisphere-shaped ("bowl") convex hull.
    """
    rand = default_rng(seed=0)

    centre = 5, 10, 12  # not the barycentre: only the centre of the "cut" face

    hull = ConvexHull(random_hemisphere(rand, n=100, radius=2, centre=centre))

    test_in = make_homogeneous(random_hemisphere(rand, n=150, radius=1.5, centre=centre, theta_limit=np.pi*0.4))
    indicators = test_hull(hull, test_in)
    assert np.all(indicators)

    test_close = make_homogeneous(random_hemisphere(rand, n=150, radius=1.975, centre=centre, theta_limit=np.pi*0.45))
    indicators = test_hull(hull, test_close)
    mean = indicators.mean()
    assert 0.48 <= mean <= 0.52  # 0.5067 for this seed

    test_out = make_homogeneous(random_hemisphere(rand, n=150, radius=2.1, centre=centre))
    indicators = test_hull(hull, test_out)
    assert not np.any(indicators)


def time_test() -> None:
    """
    Output timings for non-partitioned and partitioned configurations of a hemisphere hull test
    """
    rand = default_rng(seed=0)
    n = 10_000
    hull_pts = random_hemisphere(rand, n=n)
    test_pts = random_hemisphere(rand, n=n, radius=0.9999)

    t0 = monotonic()
    homogeneous = make_homogeneous(test_pts)
    t1 = monotonic()
    hull = ConvexHull(hull_pts)
    t2 = monotonic()
    print('n =', n)
    print(f'make_homogeneous: {t1-t0:.4f} s')
    print(f'ConvexHull:       {t2-t1:.4f} s')

    t3 = monotonic()
    indicators = test_hull(hull, homogeneous)
    t4 = monotonic()
    print(f'test_hull(part=   1): {t4-t3:.4f} s, {np.count_nonzero(indicators)} inside')

    for part in (5, 10, 20, 50, 100, 200, 500, 1_000):
        t5 = monotonic()
        indicators = test_hull(hull, homogeneous, n_partitions=part)
        t6 = monotonic()
        print(f'test_hull(part={part:4}): {t6-t5:.4f} s, {np.count_nonzero(indicators)} inside')


if __name__ == '__main__':
    cube_test()
    hemisphere_test()
    time_test()
n = 10000
make_homogeneous: 0.0000 s
ConvexHull:       0.0310 s
test_hull(part=   1): 0.8910 s, 3632 inside
test_hull(part=   5): 0.5780 s, 3632 inside
test_hull(part=  10): 0.4380 s, 3632 inside
test_hull(part=  20): 0.4060 s, 3632 inside
test_hull(part=  50): 0.3750 s, 3632 inside
test_hull(part= 100): 0.3910 s, 3632 inside
test_hull(part= 200): 0.3750 s, 3632 inside
test_hull(part= 500): 0.4060 s, 3632 inside
test_hull(part=1000): 0.4690 s, 3632 inside
0
aerobiomat On

If you define a simplex as an array of three lines of coefficients a,b,c (where ax+by+c=0), your simplex example becomes:

>>> simplex = np.array([(1, -1, 0), (-5, -2, 56), (-3, 10, 0)])

And if you represent the test points in homogeneous coordinates:

>>> test_points = np.array([[5.90479358, 5.75174668, 1],
                            [0.51156474, 0.86088186, 1],
                            [9.22371526, 4.025967  , 1],
                            [9.35307399, 5.38630723, 1],
                            [2.83575442, 5.66318545, 1],
                            [7.89786072, 6.06068206, 1],
                            [0.09838826, 1.38358132, 1],
                            [3.19776368, 9.73562359, 1],
                            [9.9122709,  0.76862067, 1],
                            [4.52352281, 6.2259428 , 1]])

Then the following expression determines which points lie inside:

>>> (test_points @ simplex.T > 0).all(axis=1)
array([ True, False,  True, False, False,  True, False, False, False,
       False])

Beware, your simplex example is defined clockwise so the test is > 0. For a counterclockwise simplex, the test should be < 0.

You also need to consider whether points on the boundary are inside or not, considering >= or <=.

0
Stéphane Laurent On

Here is a way. It is written for dimension 3 (i.e. the simplex is a tetrahedron) but it is straightforwardly generalizable to any dimension.

import numpy as np

# the four vertices of the tetrahedron (simplex of dimension 3) 
p1 = np.array([0, 0, 0])
p2 = np.array([10, 3, 0])
p3 = np.array([8, 8, 0])
p4 = np.array([5, 3, 2]) 

# the B matrix
A = np.transpose(np.array([p1 - p4, p2 - p4, p3 - p4]))
B = np.linalg.inv(A)              

# take a point
p = np.array([6, 4, 0.5])
# compute w and h
w = B @ (p - p4)
h = np.sum(w)

# test whether the point p is in the simplex <=> test <= 0
test = np.max([h - 1, -h, -w[0], -w[1], -w[2]])

For this choice of the point p, one has test < 0. That means that p is strictly contained in the simplex.

You can check that test = 0 if you take for p a vertex of the simplex.

Note that I'm not fluent in Python, maybe the code can be better written.