Scrapy handle closespider timeout in middleware

18 Views Asked by bcsta At 26 March 2024 at 15:35

I have multiple scrapers which I want to put a time limit on. The CLOSESPIDER_TIMEOUT does the job and it returns finish_reason: closespider_timeout.

I want to intercept this and use the logging library to log an error. how can I do this? should it go in a middleware?

Original Q&A

There are 1 best solutions below

wRAR On 26 March 2024 at 19:44

CLOSESPIDER_TIMEOUT is handled by the CloseSpider extension which works by scheduling a Twisted task that closes the spider after the time has passed. It's unclear to me if you want to keep this behavior or override it.

If you want to override it without disabling this extension you can subclass it and change the code it schedules in spider_opened() to the one you want.

If you want to keep it while adding your own handling you can either do the same subclassing or just subscribe to the spider_closed signal.

Scrapy handle closespider timeout in middleware

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in LOGGING

Related Questions in SCRAPY

Related Questions in TIMEOUT

Related Questions in MIDDLEWARE

Trending Questions

Popular # Hahtags

Popular Questions