Context
I have download u- and v- wind components from: ERA5 reanalysis ("reanalysis-era5-pressure-levels", variable observation) and ECMWF's Seasonal Forecast ("seasonal-original-pressure-levels", variable forecast) for the same year (2021), and I would like to use Continuous Ranked Probability Score (CRPS) from properscoring to evaluate the performance of the forecast.
For evaluation, the idea is to use xr.apply_ufunc over number (members) to calculate, for every: time, latitude and longitude; the CRPS, something like:
import properscoring as ps
crps = xr.apply_ufunc(
ps.crps_ensemble,
observation,
forecast,
input_core_dims=[[], ["number"]],
output_core_dims=[[]],
vectorize=True,
)
However, observation is indexed by ("time", "latitude", "longitude"), while forecast is indexed by ("time", "latitude", "longitude", "steps", "number"); making them not compatible (xr.align(..., join = "exact") fails because of both time and steps).
How can I transform observations's time dimension into ("time", "steps")?
What I have tried
Using backend_kwargs
For another project I used xr.open_dataset(..., backend_kwargs={"time_dims": ("valid_time",)}), which made valid_time match with time dimension and things worked out. However, for this current project, there is overlap between valid_time and time with steps (e.g. time=2021-01-01 + steps=36:00:00 > time=2021-01-02), which makes this solution undesirable, as values are lost.
forecast = xr.open_dataset(
"data/raw/forecast.grib",
engine="cfgrib",
backend_kwargs=dict(time_dims=("valid_time",)),
)
# forecast is missing values corresponding to `steps > 24:00:00`
Concatenating into a new dimension
Another solution that I tried implementing is concatenating this rolling window as different steps: for each day, extract the next 36 hours as steps into a new dimension, steps. However, this did neither create a new dimension (maybe xr.concat is not the correct function?) nor was fast enough (took more than 6 minutes for two months).
xr.concat(
[
observation.sel(time = slice(date, date + pd.Timedelta("36H")))
for date in observation.time.values
],
dim = "steps"
)