I have a netCDF file, data.nc.ncdump -h data.nc shows that the dimensions are:
dimensions:
cell = 20480 ;
nv = 3 ;
time = UNLIMITED ; // (12 currently)
In a Jupyter notebook, when I read this file in and examine it:
data = xr.open_dataset('data.nc')
data
I get the expected output of 'Dimensions: (cell: 20480, nv: 3, time: 12)'.
All fine so far. However, if I then save a copy of data as a new netCDF file:
data.to_netcdf(path='data_copy.nc')
ncdump -h data_copy.nc shows:
dimensions:
time = UNLIMITED ; // (12 currently)
cell = 20480 ;
nv = 3 ;
Oddly enough, though, if I read in this copy with:
data_copy = xr.open_dataset('data_copy.nc')
data_copy
I correctly get the same 'Dimensions: (cell: 20480, nv: 3, time: 12)' as the original.
I thought that this might have something to do with netCDF versions, as in this answer.
ncdump -k data.nc shows classic, which seems particularly weird as the answer says "there's no way to make time unlimited and have it be the last dimension in a netCDF3 file" – but that's the precise situation with data.nc, a netCDF3 file with time as the unlimited last dimension.
I have tried several format options in the xarray.Dataset.to_netcdf documentation, e.g. data.to_netcdf(path='data_copy.nc', format='NETCDF4'), but all of them still show
dimensions:
time = UNLIMITED ; // (12 currently)
cell = 20480 ;
nv = 3 ;
with ncdump, and yet the correct order when read back in with xr.open_dataset and examined as Datasets.
I've also tried specifying engine='netcdf4' when saving as netcdf4 and unlimited_dims='time', but the dimension order of my saved copy has 'time' first no matter what when checked with ncdump.
I've read every related question I can think of, but the two most frequent suggestions don't seem applicable. I don't want to reorder dimensions with ncpdq, because that changes those dimensions internally for each variable, not for the file as a whole, and I would rather prevent the problem than correct it. It also doesn't seem like a case for xarray.Dataset.transpose(), because the dimensions are already correct when the data is in Dataset form.
I've also tried reordering the dimensions of data_copy.nc with ncks as outlined here, but ncks -A -v cell data_copy.nc outfile.nc gave me:
ncks: ERROR nco_xtr_mk() reports user-supplied variable name
or regular expression 'cell' is not in and/or does not match
contents of input file
which I don't understand (and comes back to the issue of prevention being preferable).
Why does this dimension reordering happen when I save the Dataset as a netCDF file, and how can I prevent it?