Using Apify to crawl a job board, but with multiple concurrently. I have an array of proxies but my queued urls aren't using my proxies in a round robin fashion even though I use this setup. How can I set things up so that every new url that gets requested uses a different proxy round robin style? Essentially what I'm saying is, if I have 10 proxy urls and I have a max concurrent request of 5, how do I setup my crawler so that only have of my proxies are being used in any batch of requests while the others finish their requested, and take a break. I'm running into a 429 error cause I'm using the same proxy to request new pages too quickly.
Right now proxy 1 calls request 1,2,3 (then gets blocked) then proxy 2 requests 4,5,6 (then gets blocked)
How do I make it so that proxy 1 requests 1, proxy2 requests 2, proxy 3 requests 3 etc?
const collector = new Apify.PlaywrightCrawler({
requestQueue,
proxyConfiguration,
useSessionPool: true,
persistCookiesPerSession: true,
launchContext: {
launchOptions: {
headless: true,
}
},
maxConcurrency: 4,
handlePageFunction: handleFunctionCollection,
// This function is called if the page processing failed more than maxRequestRetries+1 times.
handleFailedRequestFunction: async ({ request }) => {
console.log(`Request ${request.url} failed too many times.`);
},
});
Few things: