I have multiple scrapers which I want to put a time limit on. The CLOSESPIDER_TIMEOUT does the job and it returns
finish_reason: closespider_timeout.
I want to intercept this and use the logging library to log an error. how can I do this? should it go in a middleware?
CLOSESPIDER_TIMEOUTis handled by theCloseSpiderextension which works by scheduling a Twisted task that closes the spider after the time has passed. It's unclear to me if you want to keep this behavior or override it.If you want to override it without disabling this extension you can subclass it and change the code it schedules in
spider_opened()to the one you want.If you want to keep it while adding your own handling you can either do the same subclassing or just subscribe to the
spider_closedsignal.