My goal is to do zonal statistics for hundreds of rasters using the same shaplefile and save the results as a csv file. I have wrote a code and it works fine for small size test shaplefile but not my real project.
Here is the code I am using.
##### This code now works for small but not large size problem
import rasterio
import geopandas as gpd
from rasterstats import zonal_stats
from rasterstats import gen_zonal_stats
import pandas as pd
import glob
# Open the shapefile
polygons = gpd.read_file('/test/LargeDomainBoundary.shp')
# List of rasters to process
# Get a list of all raster files in a directory
raster_list = glob.glob('/test1/*.tif')
print(raster_list)
# Creating an empty dataframe to store the results
results_df = pd.DataFrame()
# Loop through the rasters
for raster_path in raster_list:
# Open the raster file
with rasterio.open(raster_path) as src:
raster = src.read(1)
print(raster.shape)
trans = src.transform # --> here do src.transform instead of src.affine
print(trans)
# Perform zonal statistics
stats = zonal_stats(polygons, raster, stats=['mean', 'sum', 'min', 'count', 'max'], affine=trans, tolerance=0.1, nodata=-32768) #nodata=-9999
# stats = list(
# gen_zonal_stats(polygons, raster, affine=trans, stats=['mean', 'sum', 'min', 'count', 'max'], nodata=-32768,
# block_size=(1000, 1000)))
# Creating a dataframe from the stats
df = pd.DataFrame(stats)
# Adding the name of the raster processed as a column
df["Raster"] = raster_path
# Concatenating the result to the results dataframe
results_df = pd.concat([results_df, df])
# Saving the results to a CSV file
results_df.to_csv('/globalhome/test/zonal_stats_results.csv', index=False)
I am using zonal_stats of rasterstats to do the job. I also tried gen_zonal_stats, which can set the block_size so the raster can be processed in smaller chunks. However, even 100*100 block size won't solve the problem. My rasters size is (13665, 30641) and I got the following error:
Traceback (most recent call last):
File "C:\GDAL\zonalusingsameshapefile.py", line 32, in
stats = list(
File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\rasterstats\main.py", line 155, in gen_zonal_stats
fsrc = rast.read(bounds=geom_bounds)
File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\rasterstats\io.py", line 298, in read
new_array = boundless_array(
File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\rasterstats\io.py", line 179, in boundless_array
out = np.ones(shape=window_shape) * nodata
File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\numpy\core\numeric.py", line 204, in ones
a = empty(shape, dtype, order)
ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.
Process finished with exit code 1