Python aiohttp module: ambiguous .content attribute

578 Views Asked by At

Here is a little code snippet:

import aiohttp
import aiofiles

async def fetch(url):
    # starting a session
    async with aiohttp.ClientSession() as session:
        # starting a get request
        async with session.get(url) as response:
            # getting response content
            content = await response.content
            return content
 
async def save_file(file_name, content):
    async with aiofiles.open(f'./binary/{file_name}', 'wb') as f:
      while True:
            chunk = content.read(1024)
            if not chunk:
                break
            f.write(chunk)

I am trying to download some binary files using the aiohttp library and then passing them to a coroutine using aiofiles library to write the file in the disk. I have read the documentation but still couldn't figure out if I can pass content = await response.content or is it closed when the handle async with.. is closed? Because on a secondary blog, I found:

According to aiohttp’s documentation, because the response object was created in a context manager, it technically calls release() implicitly.

Which confuses me, should I embed the logic of the second function inside the response handle or is my logic correct?

2

There are 2 best solutions below

3
On BEST ANSWER

Thanks to user4815162342's code I could find a solution by parellelizing the fetch and write coroutines. I would've checked his code as the accepted solution but since I had to add some code to make it work, here it is:

# fetch binary from server
async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            async for chunk in response.content.iter_chunked(4096):
                yield chunk

# write binary function
async def save_file(file_name, chunk_iter):
    list(map(create_dir_tree, list_binary_sub_dirs))
    async with aiofiles.open(f'./binary/bin_ts/{file_name}', 'wb') as f:
        async for chunk in chunk_iter:
            await f.write(chunk)
    

async def main(urls):
    tasks = []
    for url in urls:
        print('running on sublist')
        file_name = url.rpartition('/')[-1]
        request_ts = fetch(url)
        tasks.append(save_file(file_name, request_ts))
    await asyncio.gather(*tasks)

asyncio.run(main(some_list_of_urls))
1
On

The async context manager will close the resources related to the request, so if you return from the function, you have to make sure you've read everything of interest. So you have two options:

  1. read the entire response into memory, e.g. with content = await response.read() or, if the file doesn't fit into memory (and also if you want to speed things up by reading and writing in parallel)
  2. use a queue or an async iterator to parallelize reading and writing.

Here is an untested implementation of #2:

async def fetch(url):
    # return an async generator over contents of URL
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            # getting response content in chunks no larger than 4K
            for chunk in response.content.iter_chunked(4096):
                yield chunk
 
async def save_file(file_name, content_iter):
    async with aiofiles.open(f'./binary/{file_name}', 'wb') as f:
        for chunk in content_iter:
            f.write(chunk)  # maybe you need to await this?

async def main():
    save_file(file_name, fetch(url))