Driver error reading file geodatabase from S3 using geopandas

1.6k Views Asked by At

I'm trying to read a file geodatabase file into a geodataframe using the geopandas python library. The geodatabase file is on S3, so I'm using fssspec to read it in, but I'm getting an error:

import geopandas as gpd
import fsspec

fs = fsspec.filesystem('s3', profile='my-profile', anon=False)

it works to read in a geojson file:

# this runs w/o error
g_file = fs.open("my-bucket/my-file.geojson")
gdf = gpd.read_file(g_file)

this causes an error:

gbd_file = fs.open("my-bucket/my-file.gdb/")
gdf = gpd.read_file(gdb_file, driver="FileGDB")

Here's the error traceback:

---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
fiona/_shim.pyx in fiona._shim.gdal_open_vector()

fiona/_err.pyx in fiona._err.exc_wrap_pointer()

CPLE_OpenFailedError: '/vsimem/83f6a4d8051c449c86c4c608520eb998' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

DriverError                               Traceback (most recent call last)
<ipython-input-33-7245da312526> in <module>
----> 1 gdf = gpd.read_file(file, driver='FileGDB')

~/my-conda-envs/nwm/lib/python3.7/site-packages/geopandas/io/file.py in _read_file(filename, bbox, mask, rows, **kwargs)
    158 
    159     with fiona_env():
--> 160         with reader(path_or_bytes, **kwargs) as features:
    161 
    162             # In a future Fiona release the crs attribute of features will

~/my-conda-envs/nwm/lib/python3.7/site-packages/fiona/collection.py in __init__(self, bytesbuf, **kwds)
    554         # Instantiate the parent class.
    555         super(BytesCollection, self).__init__(self.virtual_file, vsi=filetype,
--> 556                                               encoding='utf-8', **kwds)
    557 
    558     def close(self):

~/my-conda-envs/nwm/lib/python3.7/site-packages/fiona/collection.py in __init__(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, ignore_fields, ignore_geometry, **kwargs)
    160             if self.mode == 'r':
    161                 self.session = Session()
--> 162                 self.session.start(self, **kwargs)
    163             elif self.mode in ('a', 'w'):
    164                 self.session = WritingSession()

fiona/ogrext.pyx in fiona.ogrext.Session.start()

fiona/_shim.pyx in fiona._shim.gdal_open_vector()

DriverError: '/vsimem/83f6a4d8051c449c86c4c608520eb998' not recognized as a supported file format.

One other potential clue: I can get it to work by doing simply:

gdf = gpd.read_file("s3://my-bucket/my-file.gdb/", driver="FileGDB")

BUT only on a machine that is part of the bucket access policy. What I want is to access the data from any machine using the AWS credentials stored in the my-profile profile.

Unfortunately, I can't provide a way to reproduce the error since I'm doing everything on the cloud. It works fine locally...

1

There are 1 best solutions below

0
On

We are seeing similar issues using read-only keys for S3 locations and shapefiles (and possibly even NAS folders with read-only permissions).

Can you try both with keys that have read-write permissions and those with read-only? My guess is that the gdal drivers on the back end need write permissions/access even though only reading is desired.

The driver issue is hinted at in the last part of the error trace

fiona/_shim.pyx in fiona._shim.gdal_open_vector()
DriverError: ...

If there is anyone that can confirm the specifics of the permissions needed by the gdal drivers that would be great!