Resampling a netCDF file with multiple bands in python

211 Views Asked by At

I have daily mean precipitation netCDF file that covers a period of five years. The specifics of the file are:

Dimensions:       (rlat: 412, rlon: 424, time: 1826, bnds: 2)
Coordinates:
    lat           (rlat, rlon) float64 ...
    lon           (rlat, rlon) float64 ...
  * rlat          (rlat) float64 -23.38 -23.26 -23.16 ... 21.61 21.73 21.83
  * rlon          (rlon) float64 -28.38 -28.26 -28.16 ... 17.93 18.05 18.16
  * time          (time) datetime64[ns] 1951-01-01T12:00:00 ... 1955-12-31T12...
Dimensions without coordinates: bnds
Data variables:
    pr            (time, rlat, rlon) float32 ...
    rotated_pole  |S1 ...
    time_bnds     (time, bnds) object ...
Attributes: (12/22)
    CDI:                            Climate Data Interface version 1.3.2
    Conventions:                    CF-1.6
    NCO:                            4.4.2
    CDO:                            Climate Data Operators version 1.3.2 (htt...
    contact:                        Fredrik Boberg, Danish Meteorological Ins...
    creation_date:                  2019-10-15 18:05:48
    ...                             ...
    rcm_version_id:                 v1
    project_id:                     CORDEX
    CORDEX_domain:                  EUR-11
    product:                        output
    tracking_id:                    hdl:21.14103/a879aaf7-ddeb-436a-96fd-b717...
    c3s_disclaimer:                 This data has been produced in the contex...

My goal is to make a "new" dataset from the mean precipitation of every month, in python.

To open the dataset I used the xarray package

eu11 = xr.open_dataset('./GIS_DATA/eur11.nc')

I tried the following line of code

eu11_montly = eu11.resample(time='1MS').mean()

suggested in Find the daily and monthly mean from daily data, but instead it produced the following error:

numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types (dtype('S1'), dtype('S1')) -> None

My intuition tells me this happens because of the extra 'bnds' dimension of the dataset.

1

There are 1 best solutions below

0
On BEST ANSWER

In your error message, you'll see:

numpy.core._exceptions._UFuncNoLoopError:
ufunc 'add' did not contain a loop with signature
matching types (dtype('S1'), dtype('S1')) -> None

This can be confusing the first few times you see it, but it's a good one to be able to recognize. The most important takeaway from this is that something is tripping over a variable with dtype S1, which is a fixed-width string data type.

The only variable in your Dataset with this type is rotated_pole. This is a dimensionless string array (a constant) - presumably it's just the projection (CRS) information, essentially metadata, stored as a data variable.

Many operations in xarray can be performed on DataArrays/variables or on the entire Dataset. These Dataset operators are there for convenience, but you need to be careful with them, as many new users are surprised by the way they work across all the variables. In general, I'd strongly recommend only doing math with DataArrays/variables, then bundling your results into a Dataset only for documentation & writing.

When you call .resample on the whole dataset, xarray starts by broadcasting your variables against eachother along the time dim, so you end up with a much larger variable rotated_pole (time) |S1, which is 1826 times as large as your original. Next, the new rotated_pole variable is resampled (or at least, this is attempted), and then numpy objects to the idea of doing math with a fixed-width string.

Instead, resample the DataArray:

resampled_precip = eu11.pr.resample(time='1MS').mean()
# now you can return the data to a Dataset:
out_ds = xr.Dataset({"pr": resampled_precip})

Alternatively, you could resample only a subset of the data variables:

out_ds = eu11[["pr"]].resample(time="1MS").mean()

I would not recommend including time_bnds, since the resulting mean bounds will not accurately reflect the monthly frequency.