Crawlee no such file or directory storage/request_queues/default/[id].json

194 Views Asked by At

I'm trying to run a fairly simple scraper, but I keep getting the error in the title. I want to scrape around 64,000 pages, but I get the no such file error every time. Setting waitForAllRequestsToBeAdded to true doesn't fix the issue. I get the same error even when I try to process only 1,000 pages. Changing the crawler options doesn't seem to work either.

The full error I'm getting is

ERROR PlaywrightCrawler:AutoscaledPool: runTaskFunction failed.
  Error: ENOENT: no such file or directory, open '/project/storage/request_queues/default/ehieKpeBY6Mf39n.json'

This is how I'm setting up and running the crawler:

const opts = {
    navigationTimeoutSecs: 3,
    requestHandlerTimeoutSecs: 3,
    maxRequestRetries: 6,
    maxConcurrency: 20
};
const config = new Configuration({
    memoryMbytes: 8000
});
const crawler = new PlaywrightCrawler(opts, config);
crawler.router.addDefaultHandler(handlePage);
const requests = data.map(
    (d) =>
    new Request({
        url: d.url,
        userData: d
    })
);
await crawler.run(requests);

How can I get the crawler to run without the request queue failing?

0

There are 0 best solutions below