Scrapy HTTP status code is not handled or not allowed

1.6k Views Asked by At

I am trying to scrape all the shoes data from this https://www.matchesfashion.com/intl/mens/shop/shoes?page=1 url to following the next buttons upto page 7. But when I am trying to do so I am getting HTTP status code is not handled or not allowed error.

Code Snippet

Error  Snippet

1

There are 1 best solutions below

0
On

I the output you'll see that it retried your request 3 times. All of those request got a response from the server with status code 429. That status code means that the server rejeced your request because you've sent too many requests during a certain period.

Scrapy is configured by default to ignore these responses since they won't contain the data you're looking for.

To bypass this, either use a proxy like scraper API or Crawlera. Or, increase download_delay in scrapy until you don't get blocked any more. Like this:

class Website2Spider(scrapy.Spider):
    download_delay = 2 #The number you write here will be how many seconds scrapy waits before sending another request.