TimeOuts on Individual Threads

47 Views Asked by At

I am running a web scraping task where I run multiple scrapers concurrently. Sometimes my scraper gets stuck non-deterministically and there is nothing I can do about that. What happens is, after a while the entire script is stuck. I am running a total of more than 1000 scrapers, with max_workers as 20. I am guessing after a while all 20 workers get stuck. What I want is to set timeouts to individual threads so that if a thread is running for more than 120 seconds, it should just get killed or cancelled and then logged.

I found the pebble library, but interestingly it supports timeouts only to ProcessPool and not ThreadPool. My machine would crash if I use a ProcessPool. Is there a way I can implement a timeout on individual threads in python.

Here is what I tried:

import concurrent.futures 

def func(t):
    while t:
        c = 1
    return 'yo'

t = [0, 0, 0, 1, 1, 1, 1, 1, 0]
print(t)
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as pool:
    futures = []
    for _ in t:
        future = pool.submit(func, _)
        futures.append(future)
        future.args = str(_)
    for future in futures:
        try:
            result = future.result(timeout = 3)
            print(result + future.args)
        except Exception as e:
            print(e)
            print('timeout' + future.args)

It doesn't even print the exception. It just gets stuck after printing out this:

[0, 0, 0, 1, 1, 1, 1, 1, 0]
yo0
yo0
yo0
timeout1
timeout1
timeout1
timeout1
timeout1
timeout0

I also tried adding future.cancel(), in the except block but same result. What do I do?

0

There are 0 best solutions below