So I'm trying to write a crawler that uses Scrapy-playwright.
In previous project I've used only Scrapy and set RETRY_TIMES = 3. Even if I had no access to the needed resource the spider would try to send request 3 times and only then it would be closed.
Here I've tried the same but it seems it doesn't work. On the first error I get the spider is closing. Can somebody help me please? What should I do to make spider try to request url as many times as I need?
Here some example of my settings.py:
RETRY_ENABLED = True
RETRY_TIMES = 3
DOWNLOAD_TIMEOUT = 60
DOWNLOAD_DELAY = random.uniform(0, 1)
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
Thanks in advance!
make sure to catch and log exceptions within your Playwright scripts. This will help you identify if the Playwright scripts themselves are encountering errors that trigger the spider to close.
You've set DOWNLOAD_TIMEOUT to 60 seconds, which is a relatively long time. Ensure that the timeout is not too short for the types of requests you are making. If requests take a long time to respond, this could affect the retry behavior.