I have an IO operation (POST request for each Pandas row) which I'm trying to speed up using Async IO. An alternative minimal example can be found below. I want to understand why the 1st sample doesn't run in parallel and 2nd sample is faster.
1st Sample:
from time import sleep
import asyncio
import nest_asyncio
nest_asyncio.apply()
async def add(x: int, y: int, delay: int):
sleep(delay) #await asyncio.sleep(delay)
return x+y
async def get_results():
inputs = [(2,3,9), (4,5,7), (6,7,5), (8,9,3)]
cors = [add(x,y,z) for x,y,z in inputs]
results = asyncio.gather(*cors)
print(await results)
asyncio.run(get_results())
# This takes ~24s
2nd Sample:
from time import sleep
import asyncio
import nest_asyncio
nest_asyncio.apply()
async def add(x: int, y: int, delay: int):
await asyncio.sleep(delay)
return x+y
async def get_results():
inputs = [(2,3,9), (4,5,7), (6,7,5), (8,9,3)]
cors = [add(x,y,z) for x,y,z in inputs]
results = asyncio.gather(*cors)
print(await results)
asyncio.run(get_results())
# This takes ~9s as expected
In my case the sleep can be replace with requests.post()
requestsisn't async aware, so trying to use it in anasyncfunction doesn't make anything faster.You will need an async aware HTTP library such as
httpxoraiohttpto make HTTP requests that don't blockasyncfunctions.Similarly, in the first example, you're using a non-async-aware function,
time.sleep, which blocks the async loop.Also, a non-IO, Python-native-code operation (an addition) will not be sped up by
asyncio.(Recall that
asyncdoesn't mean that the functions would be run in parallel, quite the contrary.)