Crawlera, cookies, sessions, rate limiting

868 Views Asked by kenshin At 09 September 2019 at 12:47

I'm trying to use scrapinghub to crawl a website that heavily limits request rate.

If I run the spider as-is, I get 429 pretty soon.

If I enable crawlera as per standard instructions, the spider doesn't work anymore.

If I set headers = {"X-Crawlera-Cookies": "disable"} the spider works again, but I get 429s -- so I assume the limiter works (also) on the cookie.

So what would an approach be here?

There are 1 best solutions below

Manualmsdos On 09 September 2019 at 19:21

You can try RandomUserAgent, If you don't want to write your own implementation, you can try use this: