I have requested a netCDF subset using Siphon and formed a query to retrieve a variable within a bounding box:
from siphon.catalog import TDSCatalog
cat = TDSCatalog("https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_onedeg/catalog.xml?dataset=grib/NCEP/GFS/Global_onedeg/Best")
ncss = cat.datasets[0].subset()
query = ncss.query()
query.variables("Absolute_vorticity_isobaric")
query.lonlat_box(north=34., south=33., west=-102., east=-101.)
query.accept("netcdf4")
I am seeking a reliable, concise approach to getting the values of that variable's coordinates, specifically time and vertical level. A working, but impractical approach to this would be to request and work with the whole dataset.
Functional-but-Impractical Approach
Get the data
import xarray as xr
query.all_times()
data = ncss.get_data(query)
datastore = xr.backends.NetCDF4DataStore(data)
Get data as xarray.Dataset
using MetPy's xarray accessor
ds = xr.open_dataset(datastore).metpy.parse_cf()
Get the coordinate axes from a constituent xarray.DataArray
For each variable of the dataset as an xarray.DataArray
, calling ds.VARIABLE.metpy.DIMENSION
has MetPy automatically return the appropriate coordinate variable (no matter what it is named, e.g. lat
, lon
, time
, time1
, altitude_above_msl
, isobaric3
, height_above_ground1
), where DIMENSION
is one of time
, vertical
, x
, and y
.
Get the values
In this case, ds.Absolute_vorticity_isobaric.metpy.time
returns ds.time
, and ds.Absolute_vorticity_isobaric.metpy.vertical
returns ds.isobaric2
. Adding .values
to the call returns just the numpy.ndarray
with the values I have been trying to get. So, calling ds.Absolute_vorticity_isobaric.metpy.time.values
produces the following (which is truncated below):
array(['2019-11-17T00:00:00.000000000', '2019-11-17T03:00:00.000000000',
'2019-11-17T06:00:00.000000000', ..., '2020-01-02T06:00:00.000000000',
'2020-01-02T09:00:00.000000000', '2020-01-02T12:00:00.000000000'],
dtype='datetime64[ns]')
Calling ds.Absolute_vorticity_isobaric.metpy.time.values
and ds.Absolute_vorticity_isobaric.metpy.vertical.values
will return just the NumPy arrays, which is what I seek.
The Problem
While the above does in fact do what I want, it took nearly a minute and a half to run for just one variable, and it (I assume) unnecessarily taxes UCAR servers. Is there any way to get the output above without the massive overhead of loading all of that data itself?
If you are concerned about the performance of your original method, and only wish to extract the time and vertical coordinates, I would recommend using OPENDAP to access your data rather than NCSS. This will simply fetch the metadata at first, and then will lazily load the data that you request (time and vertical coordinates, in your case). Using MetPy v0.11 or newer, an example script using your TDS Catalog of interest would look something like the following:
This takes roughly a half-second to run on my system.
If you instead have MetPy older than v0.11, you will need to use
.metpy.parse_cf()
when opening the dataset, as follows: