Scrapy handle closespider timeout in middleware

18 Views Asked by At

I have multiple scrapers which I want to put a time limit on. The CLOSESPIDER_TIMEOUT does the job and it returns finish_reason: closespider_timeout.

I want to intercept this and use the logging library to log an error. how can I do this? should it go in a middleware?

1

There are 1 best solutions below

0
wRAR On

CLOSESPIDER_TIMEOUT is handled by the CloseSpider extension which works by scheduling a Twisted task that closes the spider after the time has passed. It's unclear to me if you want to keep this behavior or override it.

If you want to override it without disabling this extension you can subclass it and change the code it schedules in spider_opened() to the one you want.

If you want to keep it while adding your own handling you can either do the same subclassing or just subscribe to the spider_closed signal.