How do I use pydap library to collect THREDDS data?

946 Views Asked by At

I have been trying to use the example get_nomads.py module from Will Holgren which he was nice enough to forward my way. In the code, there is a call to get the THREDDS data as follows:

from pydap.client import open_url
dataset = open_url('https://nomads.ncdc.noaa.gov/thredds/dodsC/gfs-004/201612/20161201/gfs_4_20161201_0000_003.grb2')

Which does not work because (apparently) the old THREDDS server has been decommissioned.

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2018.2.4\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "C:\Users\pmoran\jira\slf\venv\lib\site-packages\pydap\client.py", line 64, in open_url
    dataset = DAPHandler(url, application, session, output_grid).dataset
  File "C:\Users\pmoran\jira\slf\venv\lib\site-packages\pydap\handlers\dap.py", line 51, in __init__
    raise_for_status(r)
  File "C:\Users\pmoran\jira\slf\venv\lib\site-packages\pydap\net.py", line 30, in raise_for_status
    comment=response.body
webob.exc.HTTPError: 404 Not Found

So looking around I am not able to find a THREDDS server that supports this method of data access.

BTW: I am able to get data as follows:

url = 'http://dtvirt5.deltares.nl:8080/thredds/dodsC/opendap/rijkswaterstaat/jarkus/profiles/transect.nc'
dataset = open_url(url)
<DatasetType with children 'id', 'areacode', 'areaname', 'alongshore', 'cross_shore', 'time', 'time_bounds', 'epsg', 'x', 'y', 'lat', 'lon', 'angle', 'mean_high_water', 'mean_low_water', 'max_cross_shore_measurement', 'min_cross_shore_measurement', 'nsources', 'max_altitude_measurement', 'min_altitude_measurement', 'rsp_x', 'rsp_y', 'rsp_lat', 'rsp_lon', 'time_topo', 'time_bathy', 'origin', 'altitude'>
variable = dataset['id']
print(variable[0:10])
[2000100 2000101 2000102 2000103 2000104 2000105 2000106 2000120 2000140
 2000160]

I also see that I can manually download the data from https://www.ncei.noaa.gov/thredds/dodsC/gfs-g4-anl-files/201808/20180828/gfsanl_4_20180828_1800_006.grb2.html

But I cant seem to find argument format to download the data using pydap. I think all I need is a pointer to a real THREDDS server that has the appropriate DDS and DAS files at the same URI location.

Does anyone know how to get the GFS4 GRB files using the pydap client?

Thanks

2

There are 2 best solutions below

0
On

When I try the link you provided Eric with pydap I get this error.

dataset = open_url('http://www.ncei.noaa.gov/thredds/dodsC/gfs-g4-anl-files/201612/20161201/gfsanl_4_20161201_0000_003.grb2')
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2018.2.4\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "C:\Users\pmoran\jira\slf\venv\lib\site-packages\pydap\client.py", line 64, in open_url
    dataset = DAPHandler(url, application, session, output_grid).dataset
  File "C:\Users\pmoran\jira\slf\venv\lib\site-packages\pydap\handlers\dap.py", line 64, in __init__
    self.dataset = build_dataset(dds)
  File "C:\Users\pmoran\jira\slf\venv\lib\site-packages\pydap\parsers\dds.py", line 161, in build_dataset
    return DDSParser(dds).parse()
  File "C:\Users\pmoran\jira\slf\venv\lib\site-packages\pydap\parsers\dds.py", line 49, in parse
    self.consume('dataset')
  File "C:\Users\pmoran\jira\slf\venv\lib\site-packages\pydap\parsers\dds.py", line 41, in consume
    token = super(DDSParser, self).consume(regexp)
  File "C:\Users\pmoran\jira\slf\venv\lib\site-packages\pydap\parsers\__init__.py", line 182, in consume
    raise Exception("Unable to parse token: %s" % self.buffer[:10])
Exception: Unable to parse token: <!DOCTYPE 

However, per your suggestion I'm able to get data with NETCDF4. Here's what I did.

>>> import netCDF4
>>> nc = netCDF4.Dataset('http://www.ncei.noaa.gov/thredds/dodsC/gfs-g4-anl-files/201612/20161201/gfsanl_4_20161201_0000_003.grb2')
>>> nc.variables.keys()
odict_keys(['LatLon_Projection', 'lat', 'lon', 'reftime', 'time', 'time_bounds', 
...  
'v-component_of_wind_altitude_above_msl', 'v-component_of_wind_height_above_ground', 'v-component_of_wind_tropopause', 'v-component_of_wind_sigma'])

That seems to work. Not sure whats wrong with pydap.

1
On

Haven't tested using pydap, tested using netCDF4 which does very will with THREDDS. This should work using pydap.


dataset = open_url('http://www.ncei.noaa.gov/thredds/dodsC/gfs-g4-anl-files/201612/20161201/gfsanl_4_20161201_0000_003.grb2')

The THREDDS OPeNDAP form for that one file is here:

The main catalog organized by YYYYMM/ is at: https://www.ncei.noaa.gov/thredds/catalog/gfs-g4-anl-files/catalog.html
All NCEI GFS datasets with links to TDS access can be seen here:
https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/global-forcast-system-gfs