How to do zonal statistics for large size rasters in Python?

415 Views Asked by Bryce Luee At 25 January 2023 at 18:17

My goal is to do zonal statistics for hundreds of rasters using the same shaplefile and save the results as a csv file. I have wrote a code and it works fine for small size test shaplefile but not my real project.

Here is the code I am using.

##### This code now works for small but not large size problem

import rasterio
import geopandas as gpd
from rasterstats import zonal_stats
from rasterstats import gen_zonal_stats
import pandas as pd
import glob

# Open the shapefile
polygons = gpd.read_file('/test/LargeDomainBoundary.shp')

# List of rasters to process
# Get a list of all raster files in a directory
raster_list = glob.glob('/test1/*.tif')
print(raster_list)

# Creating an empty dataframe to store the results
results_df = pd.DataFrame()

# Loop through the rasters
for raster_path in raster_list:

    # Open the raster file
    with rasterio.open(raster_path) as src:
      raster = src.read(1)
      print(raster.shape)
      trans = src.transform # --> here do src.transform instead of src.affine
      print(trans)

    # Perform zonal statistics
    stats = zonal_stats(polygons, raster, stats=['mean', 'sum', 'min', 'count', 'max'], affine=trans, tolerance=0.1, nodata=-32768) #nodata=-9999
#     stats = list(
#         gen_zonal_stats(polygons, raster, affine=trans, stats=['mean', 'sum', 'min', 'count', 'max'], nodata=-32768,
#                         block_size=(1000, 1000)))


    # Creating a dataframe from the stats
    df = pd.DataFrame(stats)
    # Adding the name of the raster processed as a column
    df["Raster"] = raster_path
    # Concatenating the result to the results dataframe
    results_df = pd.concat([results_df, df])

# Saving the results to a CSV file
results_df.to_csv('/globalhome/test/zonal_stats_results.csv', index=False)

I am using zonal_stats of rasterstats to do the job. I also tried gen_zonal_stats, which can set the block_size so the raster can be processed in smaller chunks. However, even 100*100 block size won't solve the problem. My rasters size is (13665, 30641) and I got the following error:

Traceback (most recent call last): File "C:\GDAL\zonalusingsameshapefile.py", line 32, in stats = list( File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\rasterstats\main.py", line 155, in gen_zonal_stats fsrc = rast.read(bounds=geom_bounds) File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\rasterstats\io.py", line 298, in read new_array = boundless_array( File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\rasterstats\io.py", line 179, in boundless_array out = np.ones(shape=window_shape) * nodata File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\numpy\core\numeric.py", line 204, in ones a = empty(shape, dtype, order) ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.

Process finished with exit code 1

Original Q&A

How to do zonal statistics for large size rasters in Python?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in MEMORY

Related Questions in RASTER

Related Questions in LARGE-DATA

Related Questions in RASTERSTATS

Trending Questions

Popular # Hahtags

Popular Questions