open remote zarr store with many groups and keep coordinates using xarray

779 Views Asked by At

I would like to read into the remote zarr store of https://hrrrzarr.s3.amazonaws.com/index.html#sfc/20210208/20210208_00z_anl.zarr/. Info of the zarr store is at https://mesowest.utah.edu/html/hrrr/zarr_documentation/zarrFileVariables.html

I am able to read in a variable but it doesn't seem to capture the coordinates or attributes associated with the variable (I may well be missing kwargs to open_mfdataset or open_zarr). Because there is various levels of nesting i'm not sure what is the correct path to pass

import xarray as xr
import s3fs

fs = s3fs.S3FileSystem(anon=True)
uri = "s3://hrrrzarr/sfc/20210208/20210208_00z_anl.zarr/10m_above_ground/UGRD/10m_above_ground"

file = s3fs.S3Map(uri, s3=fs)
ds = xr.open_mfdataset([file], engine="zarr")
>>> ds
<xarray.Dataset>
Dimensions:  (projection_x_coordinate: 1799, projection_y_coordinate: 1059)
Dimensions without coordinates: projection_x_coordinate, projection_y_coordinate
Data variables:
    UGRD     (projection_y_coordinate, projection_x_coordinate) float16 dask.array<chunksize=(150, 150), meta=np.ndarray>

uri = "s3://hrrrzarr/sfc/20210208/20210208_00z_anl.zarr/10m_above_ground/UGRD"
file = s3fs.S3Map(uri, s3=fs)
ds = xr.open_mfdataset([file], engine="zarr")
>>> ds
<xarray.Dataset>
Dimensions:                  (projection_x_coordinate: 1799, projection_y_coordinate: 1059)
Coordinates:
  * projection_x_coordinate  (projection_x_coordinate) float64 -2.698e+06 ......
  * projection_y_coordinate  (projection_y_coordinate) float64 -1.587e+06 ......
Data variables:
    forecast_period          timedelta64[ns] ...
    forecast_reference_time  datetime64[ns] ...
    height                   float64 ...
    pressure                 float64 ...
    time                     datetime64[ns] ...
1

There are 1 best solutions below

0
On BEST ANSWER

Xarray cannot understand nested zarr groups. It expects all the variables and coordinates to be in a flat group. I think your only option here is to manually merge the datasets. Have you tried

ds = xr.open_mfdataset([file1, file2], engine="zarr")

?