I would like to stream download a .txt.gz file from http and stream upload to an s3 bucket, I've gotten to this but it doesn't work, what am I missing?
from smart_open import open as sopen
chunk_size = (16 * 1024 * 1024)
http_url = 'http://someurl'
with sopen(http_url, 'rb', transport_params={'headers' : {'Subscription-Key': 'somekey'}}) as fin:
with sopen('s3://bucket/filename.txt.gz', 'wb') as fout:
while True:
buf = fin.read(chunk_size)
if not buf:
break
fout.write(chunk_size)
turns out it's maybe a lot simpler I was making it.
Although I am not sure if smart_open under the hood is de-compressing and re-compressing the file?