Scrapy Spider Initialization with Twisted : Exception Not Raising and Process Freezing

60 Views Asked by At

I'm working on a Scrapy project where I have a custom spider. In the spider's init method, I'm attempting to raise an exception to handle a specific error condition. However, I've encountered an issue where the exception doesn't seem to be raised during spider initialization, and the entire process appears to freeze without completing.

Here's a simplified version of my code:

from twisted.internet import asyncioreactor

asyncioreactor.install()

from twisted.internet import reactor
from scrapy import Spider
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
from project_dir.spiders.my_spider import MySpider


class MySpider(Spider):

    def __init__(self, name=None, **kwargs):
        super().__init__(name=name, **kwargs)
        raise Exception


def main():
    configure_logging()
    settings = get_project_settings()
    runner = CrawlerRunner(settings)
    runner.crawl(MySpider)
    d = runner.join()
    d.addBoth(lambda _: reactor.stop())
    reactor.run()


if __name__ == '__main__':
    main()

I expected the Exception to be raised during the initialization of MySpider, but it seems that the process just hangs without any exception being raised. However, when I raise an exception during the parsing process, it works as expected, and the parsing process stops.

Could someone please explain why the exception is not raised during the spider's initialization and why the process freezes instead? Thank you!

I used debugging techniques and observed that the code freezes during the execution of the reactor.run() line. However, the spider is initiated much earlier, specifically in the line runner.crawl(MySpider). I expected that when exceptions are thrown during spider initialization, they would interrupt the flow and potentially be logged or displayed. Instead, it appears that Twisted handles these exceptions internally without outputting any error messages and continues to work without interruption.

0

There are 0 best solutions below