Reading xarray goes16 data directly from S3 without downloading into the system

644 Views Asked by At

Reading xarray goes16 data directly from S3 without downloading into the system. the issue is that I cannot concatenate S3Files. I am recalling 24 files from S3 and want to read and extract the data for these files for the time range:

This is the code:

import datetime as dt
import xarray as xr
import fsspec
import s3fs

fs = fsspec.filesystem('s3', anon=True)

urls1=[]

for i in range (2):
    urls = [
        's3://' + f
        for f in fs.glob(f"s3://noaa-goes16/ABI-L2ACMC/2022/001/{i:02}/*.nc")
    ]
    urls1 = urls1+ urls

with fs.open(urls1[0]) as fileObj:
    ds = xr.open_dataset(fileObj, engine='h5netcdf')

however, i run into the issue I/O operation on closed file.

1

There are 1 best solutions below

7
On

Similarly to most file object interfaces in python, opening a file-like object with a context manager closes the file on exit. So in the following example:

# use fs.open to create an S3File object
with fs.open(urls1[0], mode="rb") as fileObj:
    # open the netcdf for reading, but don't load the data - instead, just
    # establish a lazy-load connection to the underlying S3File object
    ds = xr.open_dataset(fileObj, engine='h5netcdf')

# <--
# exit the context, thereby closing the S3File object

# attempt to access the data again, after the stream is closed
ds.load()  # raises IOError

Instead, you should either load all the data within the context manager:

with fs.open(urls1[0], mode="rb") as fileObj:
    with xr.open_dataset(fileObj, engine='h5netcdf') as ds:
        ds = ds.load()

Or, if you're planning to use the dataset in later code without loading:

fileObj = fs.open(urls1[0], mode="rb")
ds = xr.open_dataset(fileObj, engine='h5netcdf')

# other data operations

# be sure to close the connections when you're done
ds.close()
fileObj.close()