crawlee - How to add the same URL back to the requestQueue

490 Views Asked by At

How do i enqueue the same URL that i am currently handling the request for? I have this code and want to scrape the same URL again (possibly with a delay), i added enviroment variables that cached results will be deleted, according to this answer.

import { RequestQueue, CheerioCrawler, Configuration } from "crawlee";

const config = Configuration.getGlobalConfig();
config.set('persistStorage', false);
config.set('purgeOnStart', false);

const requestQueue = await RequestQueue.open();
await requestQueue.addRequest({ url: "https://www.google.com/" });

const crawler = new CheerioCrawler({
    requestQueue,
    async requestHandler({ $, request }) {
        console.log("Do something with scraped data...");
        await crawler.addRequests([{url: "https://www.google.com/"}]);
    }
})

await crawler.run();
1

There are 1 best solutions below

0
On BEST ANSWER

I found a solution: Adding a unique key to the Request Dictionary, for example an counter that is incremented every time before we queue a new request, solves this problem.

{url: "https://www.google.com/", uniqueKey: counter.toString()}