How to keep on requesting on a set of URLs asynchronously (python)?

332 Views Asked by At

I have a set of URLs (same http server but different request parameters). What I want to achieve is to keep on requesting all of them asynchronously or in parallel, until I kill it.

I started with using threading.Thread() to create one thread per URL, and do a while True: loop in the requesting function. This worked already faster than single thread/single request of course. But I would like to achieve better performance.

Then I tried aiohttp library to run the requests asynchronously. My code is like this (FYI, each URL is composed with url_base and product.id, and each URL has a different proxy to be used for the request):

async def fetch(product, i, proxies, session):

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}

    while True:
        try:
            async with session.get(
                url_base + product.id,
                proxy = proxies[i],
                headers=headers,
                ssl = False)
            ) as response:
                content = await response.read()
                print(content)
        except Exception as e:
            print('ERROR ', str(e))


async def startQuery(proxies):
    tasks = []
    async with aiohttp.ClientSession() as session:
        for [i, product] in enumerate(hermes_products):
            task = asyncio.ensure_future(fetch(product, i, proxies, session))
            tasks.append(task)
        responses = asyncio.gather(*tasks)
        await responses


loop = asyncio.get_event_loop()
loop.run_until_complete(startQuery(global_proxy))

The observation is: 1) it is not as fast as I would expect. Actually slower than using threads. 2)More importantly, the requests only returned normal in the beginning of the running, and soon almost all of them returned several errors like:

ERROR  Cannot connect to host PROXY_IP:PORT ssl:False [Connect call failed ('PROXY_IP', PORT)]

or

ERROR  503, message='Too many open connections'

or

ERROR  [Errno 54] Connection reset by peer

Am I doing something wrong here (particularly with the while True loop? If so, how can I achieve my goal properly?

0

There are 0 best solutions below