xarray read remote grib file on s3 using cfgrib

1.9k Views Asked by At

Can the crgrib engine handle reading remote files? It doesn't look like it according to Martin Durant's comment (https://github.com/ecmwf/cfgrib/issues/198#issuecomment-772852412)

There is a smallish grib file hosted on s3: https://mf-nwp-models.s3.amazonaws.com/index.html#arpege-world/v2/2021-02-16/00/UGRD/10m/ (note don't click on a file as it'll download).

When I try to reading it use sf3s I get

import s3fs
import xarray as xr

fs = s3fs.S3FileSystem(anon=True)

uri = "s3://mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2"

file = s3fs.S3Map(uri, s3=fs)
ds = xr.open_dataset(file, engine="cfgrib")

Can't create file '<File-like object S3FileSystem, mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2>.90c91.idx'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 342, in from_indexpath_or_filestream
    with compat_create_exclusive(indexpath) as new_index_file:
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 274, in compat_create_exclusive
    fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
FileNotFoundError: [Errno 2] No such file or directory: '<File-like object S3FileSystem, mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2>.90c91.idx'
Can't read index file '<File-like object S3FileSystem, mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2>.90c91.idx'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 352, in from_indexpath_or_filestream
    index_mtime = os.path.getmtime(indexpath)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/genericpath.py", line 55, in getmtime
    return os.stat(filename).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: '<File-like object S3FileSystem, mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2>.90c91.idx'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 572, in open_dataset
    store = opener(filename_or_obj, **extra_kwargs, **backend_kwargs)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/cfgrib_.py", line 45, in __init__
    self.ds = cfgrib.open_file(filename, **backend_kwargs)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 650, in open_file
    index = open_fileindex(path, grib_errors, indexpath, index_keys).subindex(filter_by_keys)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 637, in open_fileindex
    return stream.index(index_keys, indexpath=indexpath)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 269, in index
    return FileIndex.from_indexpath_or_filestream(self, index_keys, indexpath)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 370, in from_indexpath_or_filestream
    return cls.from_filestream(filestream, index_keys)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 297, in from_filestream
    for message in filestream:
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 240, in __iter__
    with open(self.path, 'rb') as file:
TypeError: expected str, bytes or os.PathLike object, not S3File
1

There are 1 best solutions below

1
On BEST ANSWER

Think I got it via https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally

import fsspec
import xarray as xr

uri = "simplecache::s3://mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2"

file = fsspec.open_local(uri, s3={'anon': True}, filecache={'cache_storage':'/tmp/files'})

ds = xr.open_dataset(file, engine="cfgrib")