I have a function to create an xarray Dataset based on various output from a model. One of the pieces of information I collect is a list of lists (not same length). This variable is called cids
and has the same dimension repo_id
as other variables.
So far the following has always worked fine:
datetime = pd.date_range('20010101', periods=100, freq='D')
obs = [xr.DataArray(np.random.rand(100), dims={'datetime': datetime}),xr.DataArray(np.random.rand(100), dims={'datetime':datetime}) ]
cids = [[1, 2, 3], [1, 2, 3, 4]]
keys = np.array([['A', 'A', 'B'], ['C', 'D', 'E']])
xr.Dataset({'obs': (['repo_id', 'datetime'], np.array(obs)), 'cig_id': ('repo_id', keys[:, 0]), 'repo': ('repo_id', keys[:, 2]), 'cids': ('repo_id', cids)}, coords={'repo_id': keys[:, 1], 'datetime': obs[0].datetime})
This yields the following results, as expected:
<xarray.Dataset>
Dimensions: (datetime: 100, repo_id: 2)
Coordinates:
* repo_id (repo_id) <U1 'A' 'D'
* datetime (datetime) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99
Data variables:
obs (repo_id, datetime) float64 0.9393 0.468 0.7168 ... 0.03513 0.8771
cig_id (repo_id) <U1 'A' 'C'
repo (repo_id) <U1 'B' 'E'
cids (repo_id) object [1, 2, 3] [1, 2, 3, 4]
However, I recently had a case where the length of the lists in my cids
variable was the same:
datetime = pd.date_range('20010101', periods=100, freq='D')
obs = [xr.DataArray(np.random.rand(100), dims={'datetime': datetime}),xr.DataArray(np.random.rand(100), dims={'datetime':datetime}) ]
# see here that length of elements in cids are both equal
cids = [[1, 2, 3], [1, 2, 3]]
keys = np.array([['A', 'A', 'B'], ['C', 'D', 'E']])
xr.Dataset({'obs': (['repo_id', 'datetime'], np.array(obs)), 'cig_id': ('repo_id', keys[:, 0]), 'repo': ('repo_id', keys[:, 2]), 'cids': ('repo_id', cids)}, coords={'repo_id': keys[:, 1], 'datetime': obs[0].datetime})
Which produces the following error:
cids = [[1, 2, 3], [1, 2, 3]]
keys = np.array([['A', 'A', 'B'], ['C', 'D', 'E']])
xr.Dataset({'obs': (['repo_id', 'datetime'], np.array(obs)), 'cig_id': ('repo_id', keys[:, 0]), 'repo': ('repo_id', keys[:, 2]), 'cids': ('repo_id', cids)}, coords={'repo_id': keys[:, 1], 'datetime': obs[0].datetime})
Traceback (most recent call last):
File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/variable.py", line 107, in as_variable
obj = Variable(*obj)
File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/variable.py", line 309, in __init__
self._dims = self._parse_dimensions(dims)
File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/variable.py", line 503, in _parse_dimensions
"number of data dimensions, ndim=%s" % (dims, self.ndim)
ValueError: dimensions ('repo_id',) must have the same length as the number of data dimensions, ndim=2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-48-9a2b518ac4d3>", line 2, in <module>
xr.Dataset({'obs': (['repo_id', 'datetime'], np.array(obs)), 'cig_id': ('repo_id', keys[:, 0]), 'repo': ('repo_id', keys[:, 2]), 'cids': ('repo_id', cids)}, coords={'repo_id': keys[:, 1], 'datetime': obs[0].datetime})
File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/dataset.py", line 537, in __init__
data_vars, coords, compat="broadcast_equals"
File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/merge.py", line 467, in merge_data_and_coords
objects, compat, join, explicit_coords=explicit_coords, indexes=indexes
File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/merge.py", line 552, in merge_core
collected = collect_variables_and_indexes(aligned)
File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/merge.py", line 277, in collect_variables_and_indexes
variable = as_variable(variable, name=name)
File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/variable.py", line 113, in as_variable
"{} to Variable.".format(obj)
ValueError: Could not convert tuple of form (dims, data[, attrs, encoding]): ('repo_id', [[1, 2, 3], [1, 2, 3]]) to Variable.
Input would be appreciated, not sure how best to handle this. It seems xarray is trying to be smart and assuming that the dimension of cids
is not repo_id
of length two, but rather length 3... a bug??
Currently the first example creates a variable
cids
which contains a list:Is that intentional? Generally you would want to store a single value along each dimension, rather than a list.
I appreciate it's a confusing pair of cases, because it's surprising it would work for unequal sized lists but not for equally sized. Xarray is attempting to put the values in the list along another dimension, and is missing an extra dimension; while not attempting to do it for unequally sized lists.
The error message is bad. But I'm not sure what I'd change in the functionality: potentially it could raise an error on your first example given it's unlikely someone wants objects that are lists.