How to store a subset of Xaray data into Zarr?

95 Views Asked by At

Context

In the section Appending to existing Zarr stores, the example is as follows

import xarray as xr
import dask.array

# Write zarr with empty structure
dummies = dask.array.zeros(30, chunks=10)
ds = xr.Dataset({"foo": ("x", dummies)})
path = "path/to/directory.zarr"

ds.to_zarr(path, compute=False)

# Append
ds = xr.Dataset({"foo": ("x", np.arange(30))})
ds.isel(x=slice(0, 10)).to_zarr(path, region={"x": slice(0, 10)})

Question

The example works fine as long as I know the integer slices of the array.

How do I append if I do not know the required region? That is, if I'm given the result of ds.isel(x=slice(0, 10)) without knowing the region slice(0, 10)?

Possible solutions

In theory, I have all the information from the coordinates. For example, for float coordinates, I could do something like

start_index = (ds['x']>=start).argmax().values.item()
end_index = (ds['x']<=end).argmin().values.item()
# region is slice(start_index, end_index)

to determine the isel/zarr indices.

However, this gets fairly involved when dealing with a dataset with many dimensions and coordinates of various types (float, string, datetime). This makes me wonder if there is a more straightforward way.

1

There are 1 best solutions below

0
Ryan On

This is currently not implemented in Xarray, but it has been requested as a feature: https://github.com/pydata/xarray/issues/7702

That issue also explains a workaround very similar to the one you used here.

It seems like prioritizing this would be a good idea for the Xarray team.