Loading a GRIB from the web in Python without saving the file locally

1.2k Views Asked by At

I would like to download a GRIB file from the web:

Opt1: https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20210801/12/atmos/gfs.t12z.pgrb2.1p00.f000

Opt2: https://www.ncei.noaa.gov/data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_1200_000.grb2 (FYI, data is identical)

A similar question is asked here: How can a GRIB file be opened with pygrib without first downloading the file? However, ultimately a final solution isn't given as the data source has multiple responses (Causing issues). When I tried to recreate the code:

url = 'https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20210801/12/atmos/gfs.t12z.pgrb2.1p00.f000'
urllib.request.install_opener(opener)

req = urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'})
data = urllib.request.urlopen(req, timeout = 300)

import pygrib 
grib = pygrib.fromstring(data.read())

My error is different:

ECCODES ERROR   :  grib_handle_new_from_message: No final 7777 in message!
Traceback (most recent call last):
  File "grib_data_test.py", line 139, in <module>
    grib = pygrib.fromstring(data.read())
  File "src\pygrib\_pygrib.pyx", line 627, in pygrib._pygrib.fromstring
  File "src\pygrib\_pygrib.pyx", line 1390, in pygrib._pygrib.gribmessage._set_projparams
  File "src\pygrib\_pygrib.pyx", line 1127, in pygrib._pygrib.gribmessage.__getitem__
RuntimeError: b'Key/value not found'

I have also tried to work directly with the osgeo.gdal library (preferable for my project as it is already in use in the project). Documentation: https://gdal.org/user/virtual_file_systems.html#vsimem-in-memory-files:

Attempt1: url = "/vsicurl/https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20210801/12/atmos/gfs.t12z.pgrb2.0p50.f000"
Attempt2: url = "/vsis3/https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20210801/12/atmos/gfs.t12z.pgrb2.0p50.f000"
Attempt3: url = "/vsicurl/https://www.ncei.noaa.gov/data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2"

grib = gdal.Open(url)

Errors:

Attempt1: ERROR 11: CURL error: schannel: next InitializeSecurityContext failed: Unknown error (0x80092012) - The revocation function was unable to check revocation for the certificate.
Attempt2: ERROR 11: HTTP response code: 0
Attempt3: ERROR 4: /vsicurl/https://www.ncei.noaa.gov/data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2 is a grib file, but no raster dataset was successfully identified.

For Attempt1 and 2: I feel like both attempts should work here - normally you do not need any specific AWS Credentials / connection to access the publically available s3 bucket data (As it is accessed using urllib above).

For Attempt3, there is a similar issue here: https://gis.stackexchange.com/questions/395867/opening-a-grib-from-the-web-with-gdal-in-python-using-vsicurl-throws-error-on-m However I do not experience the same issue, my Output:

gdalinfo /vsicurl/https://www.ncei.noaa.gov/data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2 --debug on --config CPL_CURL_VERBOSE YES

HTTP: libcurl/7.78.0 Schannel zlib/1.2.11 libssh2/1.9.0
HTTP: GDAL was built against curl 7.77.0, but is running against 7.78.0.
* Couldn't find host www.ncei.noaa.gov in the (nil) file; using defaults
*   Trying 205.167.25.177:443...
* Connected to www.ncei.noaa.gov (205.167.25.177) port 443 (#0)
* schannel: disabled automatic use of client certificate
> HEAD /data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2 HTTP/1.1
Host: www.ncei.noaa.gov
Accept: */*

* schannel: server closed the connection
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Mon, 20 Sep 2021 14:38:59 GMT
< Server: Apache
< Strict-Transport-Security: max-age=31536000
< Last-Modified: Sun, 01 Aug 2021 04:46:49 GMT
< ETag: "287c05a-5c87824012f22"
< Accept-Ranges: bytes
< Content-Length: 42451034
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: X-Requested-With, Content-Type
< Connection: close
<
* Closing connection 0
* schannel: shutting down SSL/TLS connection with www.ncei.noaa.gov port 443
VSICURL: GetFileSize(https://www.ncei.noaa.gov/data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2)=42451034  response_code=200
VSICURL: Downloading 0-16383 (https://www.ncei.noaa.gov/data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2)...
* Couldn't find host www.ncei.noaa.gov in the (nil) file; using defaults
* Hostname www.ncei.noaa.gov was found in DNS cache
*   Trying 205.167.25.177:443...
* Connected to www.ncei.noaa.gov (205.167.25.177) port 443 (#1)
* schannel: disabled automatic use of client certificate
> GET /data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2 HTTP/1.1
Host: www.ncei.noaa.gov
Accept: */*
Range: bytes=0-16383

* schannel: failed to decrypt data, need more data
* Mark bundle as not supporting multiuse
< HTTP/1.1 206 Partial Content
< Date: Mon, 20 Sep 2021 14:39:00 GMT
< Server: Apache
< Strict-Transport-Security: max-age=31536000
< Last-Modified: Sun, 01 Aug 2021 04:46:49 GMT
< ETag: "287c05a-5c87824012f22"
< Accept-Ranges: bytes
< Content-Length: 16384
< Content-Range: bytes 0-16383/42451034
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: X-Requested-With, Content-Type
< Connection: close
<
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* schannel: failed to decrypt data, need more data
* Closing connection 1
* schannel: shutting down SSL/TLS connection with www.ncei.noaa.gov port 443
VSICURL: Got response_code=206
VSICURL: Downloading 81920-98303 (https://www.ncei.noaa.gov/data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2)...
* Couldn't find host www.ncei.noaa.gov in the (nil) file; using defaults
* Hostname www.ncei.noaa.gov was found in DNS cache
*   Trying 205.167.25.177:443...
* Connected to www.ncei.noaa.gov (205.167.25.177) port 443 (#2)
* schannel: disabled automatic use of client certificate
* schannel: failed to receive handshake, SSL/TLS connection failed
* Closing connection 2
VSICURL: DownloadRegion(https://www.ncei.noaa.gov/data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2): response_code=0, msg=schannel: failed to receive handshake, SSL/TLS connection failed
VSICURL: Got response_code=0
GRIB: ERROR: Ran out of file in Section 7
ERROR: Problems Jumping past section 7

ERROR 4: /vsicurl/https://www.ncei.noaa.gov/data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2 is a grib file, but no raster dataset was successfully identified.
gdalinfo failed - unable to open '/vsicurl/https://www.ncei.noaa.gov/data/global-forecast-system/access/grid-003-1.0-degree/analysis/202108/20210801/gfs_3_20210801_0000_000.grb2'.

Basically. I can currently download the files using urllib, then:

file_out = open(<file.path>, 'w')
file_out.write(data.read())
grib = gdal.Open(file_out)

does what I require. But there is a desire to not save the files locally during the temporary processing moment due to the system we are working within.

Python Versions:

gdal                      3.3.1            py38hacca965_1    defaults
pygrib                    2.1.4            py38hceae430_0    defaults
python                    3.8.10          h7840368_1_cpython    defaults

Cheers.

1

There are 1 best solutions below

0
On

This worked for me. I was using the GEFS data hosted on AWS instead though. I believe there is GFS on AWS also. No account should be needed, so it would just be a matter of changing the bucket and s3_object names to point to GFS data instead

import pygrib
import boto3
from botocore import UNSIGNED
from botocore.config import Config

s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
bucket_name = 'noaa-gefs-pds'
s3_object = 'gefs.20220425/00/atmos/pgrb2ap5/gec00.t00z.pgrb2a.0p50.f000'

obj = s3.get_object(Bucket=bucket_name, Key=s3_object)['Body'].read()
grbs = pygrib.fromstring(obj)
print(type(grbs))