I'm trying to use scrapinghub to crawl a website that heavily limits request rate.
If I run the spider as-is, I get 429 pretty soon.
If I enable crawlera as per standard instructions, the spider doesn't work anymore.
If I set headers = {"X-Crawlera-Cookies": "disable"} the spider works again, but I get 429s -- so I assume the limiter works (also) on the cookie.
So what would an approach be here?
You can try RandomUserAgent, If you don't want to write your own implementation, you can try use this:
https://github.com/cnu/scrapy-random-useragent