Quickly filtering geopositions in or out of a multipolygon with C#

444 Views Asked by At

I am using NetTopologySuite with C# to filter points inside the precise boundaries of a country with a pretty simple way:

var path = "fr.shp"; // "big" country and boundaries need to be precise in my case

var reader = new ShapeDataReader(path);
var mbr = reader.ShapefileBounds;
var result = reader.ReadByMBRFilter(mbr);

var polygons = new List<Geometry>();

using (var coll = result.GetEnumerator())
{
    while (coll.MoveNext())
    {
        var item = coll.Current;

        if (item == null)
        {
            continue;
        }

        polygons.Add(item.Geometry);
    }
}

var polygon = new GeometryCombiner(polygons).Combine();

var points = new List<Point>();
List<Point> pointsToFilterWithBorders; // loaded from DB, not visible here but we have 1,350,000 points to filter

Parallel.ForEach(pointsToFilterWithBorders, point =>
{
    if (polygon.Contains(point))
        points.Add(point);
});

It's working fine (filtering works great!) but it's pretty slow... like one day to do the filtering for only 1,350,000 points!

Any idea on how to improve that?

I tried to use Parallel.ForEach but still very long and I tried to find something like a "batch" compare in NetTopologySuite but couldn't find a quicker solution to filter my points in this big shapefile...

3

There are 3 best solutions below

0
Denis Huvelle On BEST ANSWER

Thanks everyone for the help! At the end, I found a solution inspired by your ideas :-)

I can't share the code itself because it's inside a closed source code but here is the idea:

My code was already grouping all the points by squares of 1 km² for other purposes: I just had to check if these polygons where not fully inside the boundaries. When that was not the case, I just had to do a check on all points inside these squares!

Still a little bit long (but less than one hour for France and it's mixed with other pieces of code which were already a little bit slow) but clearly quicker!

0
JonasH On

Presumably the polygon that defines the border is quite large and detailed, otherwise it should not take such a long time. There are a few approaches I would consider

Do an initial Bounding box check

Create an axis aligned bounding box for the polygon. Start by testing if the point is inside this box before continuing with any more complex check. This should be very easy to implement.

Render the polygon

create a large bitmap and render your polygon to the bitmap, you will need so use some kind of transform to translate between GIS coordinate and pixel coordinates. Then you can simply check the pixel value for each point in the bitmap. Just make sure to disable anti aliasing. Note that this would be an approximate solution.

You could also do something like rendering the polygon in one color and then rendering the border in another color using a pen more than one pixel wide. Any points on the border would be uncertain and may need a more accurate test, while any points outside the border should be guaranteed to be either inside or outside your polygon.

Use a tree structure to speed up the check

The typical approach to speed up any kind of repeated search is to build some kind of search structure to speedup the search, this is often a tree. In this specific case I believe a Binary Space Partition tree might be suitable. The advantage here is that the number of checks needed would be O(log n) where n is the number of lines in your polygons, instead of the O(n) that I suspect the .Contains() method is.

But I have never implemented a BSP tree, so I refer to other sources on how to implement one.

You could also consider simplifying your border polygon, but it might be more difficult to ensure the simplified polygon is either fully contained by the true polygon, or fully contains the true polygon.

Note that most methods assume that you are using a planar coordinate system, so you might need to do some conversions if you are using anything else.

0
FObermaier On

If possible use spatial capabilities of your database and only load points into your set that possibly intersect with your MultiPolygon, i.e. they are withing the bounding box of it.
Then for 1:n geometry checks use NetTopologySuite's PreparedGeometry predicates:

// polygon and pointsToFilterWithBorders from sample above
var prep = NetTopologySuite.Geometries.Prepared.PreparedGeometryFactory.Prepare(polygon);
var points = new System.Collections.Generic<Point>();
foreach(var pt in pointsToFilterWithBorders)
   if (prep.Contains(pt)) points.Add(pt);