Use smart_open to download a .gz stream from http and upload to s3 bucket

841 Views Asked by At

I would like to stream download a .txt.gz file from http and stream upload to an s3 bucket, I've gotten to this but it doesn't work, what am I missing?

from smart_open import open as sopen

chunk_size = (16 * 1024 * 1024)
http_url = 'http://someurl'

with sopen(http_url, 'rb', transport_params={'headers' : {'Subscription-Key': 'somekey'}}) as fin:    
    with sopen('s3://bucket/filename.txt.gz', 'wb') as fout:                

                    while True:
                        buf = fin.read(chunk_size)
                        if not buf:
                            break
                        fout.write(chunk_size)
1

There are 1 best solutions below

0
On

turns out it's maybe a lot simpler I was making it.

Although I am not sure if smart_open under the hood is de-compressing and re-compressing the file?

from smart_open import open as sopen

http_url = 'http://someurl'

with sopen(http_url, 'rb', transport_params={'headers' : {'Subscription-Key': 'somekey'}}) as fin:    
    with sopen('s3://bucket/filename.txt.gz', 'wb') as fout: 
        for line in fin:
            fout.write(line)