Scrapy redirects to homepage for some urls

Question

Scrapy redirects to homepage for some urls

662 Views Asked by Aditya At 02 July 2025 at 04:44

I am new to Scrapy framework & currently using it to extract articles from multiple 'Health & Wellness' websites. For some of the requests, scrapy is redirecting to homepage(this behavior is not observed in browser). Below is an example:

Command: scrapy shell "http://www.bornfitness.com/blog/page/10/" Result: 2015-06-19 21:32:15+0530 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080 2015-06-19 21:32:15+0530 [default] INFO: Spider opened 2015-06-19 21:32:15+0530 [default] DEBUG: Redirecting (301) to http://www.bornfitness.com/> from http://www.bornfitness.com/blog/page/10/> 2015-06-19 21:32:16+0530 [default] DEBUG: Crawled (200) http://www.bornfitness.com/> (referer: None)

Note that the page number in url(10) is a two-digit number. I don't see this issue with urls with single-sigit page number(8 for example). Result: 2015-06-19 21:43:15+0530 [default] INFO: Spider opened 2015-06-19 21:43:16+0530 [default] DEBUG: Crawled (200) http://www.bornfitness.com/blog/page/8/> (referer: None)

Original Q&A

There are 1 best solutions below

**tegancp** · Accepted Answer

When you have trouble replicating browser behavior using scrapy, you generally want to look at what are those things which are being communicated differently when your browser is talking to the website compared with when your spider is talking to the website. Remember that a website is (almost always) not designed to be nice to webcrawlers, but to interact with web browsers.

For your situation, if you look at the headers being sent with your scrapy request, you should see something like:

In [1]: request.headers
Out[1]:
{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
 'Accept-Encoding': 'gzip,deflate',
 'Accept-Language': 'en',
 'User-Agent': 'Scrapy/0.24.6 (+http://scrapy.org)'}

If you examine the headers sent by a request for the same page by your web browser, you might see something like:

**Request Headers**

GET /blog/page/10/ HTTP/1.1    
Host: www.bornfitness.com    
Connection: keep-alive    
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36
DNT: 1    
Referer: http://www.bornfitness.com/blog/page/11/
Accept-Encoding: gzip, deflate, sdch    
Accept-Language: en-US,en;q=0.8
Cookie: fealty_segment_registeronce=1; ... ... ...

Try changing the User-Agent in your request. This should allow you to get around the redirect.

Scrapy redirects to homepage for some urls

There are 1 best solutions below

Related Questions in SCRAPY

Related Questions in SCRAPY-SHELL

Trending Questions

Popular # Hahtags

Popular Questions