geofileops gives wrong result in intersection when data is big

41 Views Asked by At

I m using geofileops to make geopandas functions faster however I have this problem!

from detection_algorithm.utils.database_connection import DatabaseConnection
import geopandas as gpd
import datetime
from io import BytesIO
import geofileops as gfo
import time
crs = 'EPSG:2154'
base_year = datetime.datetime.now().year - 2
query = f"SELECT geometry FROM departments WHERE \"INSEE_DEP\" = '71'"
database_engine = DatabaseConnection().con
department_gdf = gpd.read_postgis(query, database_engine, geom_col="geometry")
department_gdf.crs = crs

department_gdf.to_file("/tmp/department.gpkg")

current_difference_layer = gpd.read_postgis(
    "SELECT geometry FROM oso WHERE departement='71' and \"Classe\" IN (5, 6, 7, 8, 9, 10, 11, 12, 13)",
    database_engine,
    geom_col="geometry"
)
print("JUST AFTER OSO")
current_difference_layer.to_file("/tmp/oso.gpkg",)
current_difference_layer.crs = crs

for year in range(base_year, base_year - 1, -1):
    print("HER")
    rpg_gdf = gpd.read_postgis(
        "select geometry from rpg where region=1 and annee_d_export={}".format(year),
        database_engine,
        geom_col="geometry"
    )
    strat_time=time.time()

    # Convert GeoDataFrame to GeoPackage (GPKG) format in memory
    file_path = "/tmp/rpg.gpkg"
    gdf=rpg_gdf.to_crs(crs)
    gdf.to_file(file_path)
 
    
    
    gfo.intersection(file_path,"/tmp/department.gpkg","/tmp/rpg_dep.gpkg")
    rpg=gpd.read_file("/tmp/rpg_dep.gpkg")
    print("rpg LEN: "+str(len(rpg)))
    

Having this code: sometimes when I limit the data from the query: select geometry from rpg where region=1 and annee_d_export={} and I do the intersection of rpg and department, I get good result (but I need all the rpg to be intersected!), when I select without limit (like 600k rows), I get rpg len: 0 or clearly it's not because when I limit to 10K for example I get good result. thank you!

I tried limiting the data and it worked,

btw when I don't limit the data I see that all my cpus work at 100% so maybe there is a problem in the parallelization.

UPD: adding this line rpg_gdf=rpg_gdf.explode(index_parts=True) before the intersection fixed the problem. :)

0

There are 0 best solutions below