I want to have multiple coordinate systems for a given dimension, e.g., a single "time" dimension with coordinates for (the default) datetime64[ns] type but also for a numeric year, or a season. This seems to be possible using Dataset.assign_coords() but has some unexpected behavior when subsetting a DataArray within the Dataset.
I'm using the publicly available NASA dataset available here (with Earthdata login); described in more detail at this link.
import xarray as xr
import numpy as np
ds = xr.open_mfdataset('GRCTellus.JPL.200204_202311.GLO.RL06.1M.MSCNv03CRI.nc')
ds.coords['lon'] = ds.coords['lon'] - 180
# Get a list of all the dates from the "time" coordinate
dates = ds.coords['time'].values
# Convert each "datetime64[ns]" object to a 4-digit year
years = []
for each in dates:
years.append(int(str(each)[0:4]))
# Create a new set of coordinates
ds = ds.assign_coords(year = ('year', years))
<xarray.Dataset>
Dimensions: (lat: 360, time: 227, lon: 720, bounds: 2, year: 227)
Coordinates:
* lat (lat) float64 -89.75 -89.25 -88.75 ... 88.75 89.25 89.75
* time (time) datetime64[ns] 2002-04-17T12:00:00 ... 2023-11-16
* lon (lon) float64 -179.8 -179.2 -178.8 ... 178.8 179.2 179.8
* year (year) int64 2002 2002 2002 2002 2002 ... 2023 2023 2023 2023
This works great, so far, in that I have a year coordinate now. I also have a year dimension listed, which is not what I wanted.
The more serious problem is that when I select a DataArray from this Dataset, it doesn't know anything about the new coordinates.
ds['lwe_thickness'].sel(lon = slice(-124, -114), lat = slice(32, 42))
<xarray.DataArray 'lwe_thickness' (time: 227, lat: 20, lon: 20)>
dask.array<getitem, shape=(227, 20, 20), dtype=float64, chunksize=(46, 20, 20), chunktype=numpy.ndarray>
Coordinates:
* lat (lat) float64 32.25 32.75 33.25 33.75 ... 40.25 40.75 41.25 41.75
* time (time) datetime64[ns] 2002-04-17T12:00:00 ... 2023-11-16
* lon (lon) float64 -123.8 -123.2 -122.8 -122.2 ... -115.2 -114.8 -114.2
The result is that "time" is listed as a coordinate, but not "year". This has implications for using DataArray.polyfit(), because without a "year" coordinate I can't get coefficient values denominated by year.
How can I make a new coordinate system available to each DataArray in my Dataset? I have looked at:
- "add new coordinates to an xarray.DataArray" - But when I try this, I get a
ValueError: cannot add coordinates with new dimensions to a DataArray
You are telling it to connect the new coordinate to a newly created dimension 'year'. I didn't try this on your exact data, but try connecting the coordinate 'year' to the existing dimension 'time'
Then the 'year' coordinate should also show up in the DataArrays that are defined over the dimension 'time'.