I have a URL of the form
url = "http://www.example.com/search.html#query=test"
When passing this to scrapy.Request as
yield scrapy.Request(url, self.parse_result)
and picking it up in parse_result like this
def parse_result(self, response):
print(response.url)
the last bit in the string is always stripped, and is printed as follows
http://www.example.com/search.html
What do I need to do to be able to pick up the string in full from response.url meaning including the #query=test part? Tried to use the %23 code instead of the hashtag, but that is just being passed on as the number but not as a hashtag. And using
urllib.parse.quote(url)
creates a value error:
ValueError: Missing scheme in request
Peter, the thing is that servers never get
hash(or fragment identifier - that's how that piece is called). Per https://en.wikipedia.org/wiki/Fragment_identifier "its processing is exclusively client-side".In your case it means that there is some
JSon webpage that will pick-up hash after page has been loaded, process it and bring a page to an actual state. Out of the boxScrapyis not capable of executing JS. So you have a few options here:Networktab of your browser and try to see if browser is making any XHR/Ajax requests. If yes, they may contain information you need to scrape.Inspect Element- that will show you html after it's been processed by JS. Instead, useView page source- that will show exactly what server has sent you).