Stream HTTP content but skip downloading some lines at all in Python

310 Views Asked by At

Edit- This is partially solved. The exact implementation details are not figured out yet, but the answer it to use HTTP range headers, as in Ezequiel's comment.

In case my explanation is not clear enough, I am trying to replicate the procedure here: https://www.cpc.ncep.noaa.gov/products/wesley/fast_downloading_grib.html in python.

edit: From a friends' kind advice, I've figured out part of the solution. I need to just grab a specific byte range using my get request- that's all that NOAA's PERL scripts are doing.

I'm attempting to download only a few fields from a "GRIB" file- a certain array-like format that the national weather service uses. It is at a specific HTTPS url, e.g. https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20201209/00/gfs.t00z.pgrb2.0p25.f000. But very specifically, I need to only download the lines that are relevant to me- e.g. lines 5, 10, and 30. I'd like to avoid downloading the content of the other lines at all, but I'm not sure about the low-level behavior of the requests library here (or a suitable alternative).

1

There are 1 best solutions below

3
On

This should be the code:

req = request.get('https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20201209/00/gfs.t00z.pgrb2.0p25.f000',stream=True)
for line in req.iter_lines():
    next(line)
    x2 = next(line)