I am using Scrapy to scrap a website that lists some specific data, but there is a button Show More that needs to be clicked many times until the data are all complete.
The URL of the page :
www.websiteiamscraping.com/data/sheet/historical?s=MSA:CAS
When I click on the button Show More, here's the URL that's sent : www.websiteiamscraping.com/data/ajax/getmorehistoricalsheets?StartDate=42598&s=MSA%3ACAS&isLRS=false
(the StartDate parameter changes every time I click the button)
and that returns the additional data in the form of HTML in a JSON object, but the HTML data seems to be mixed with ASCII symbols like : 55.21k\u003c/span\u003e
My code is as the following :
class DataSpider(scrapy.Spider):
name = "data"
start_urls = [
'www.websiteiamscraping.com/data/sheet/historical?s=MSA:CAS'
]
def parse(self, response):
page = response.url.split("=")[1].split(":")[0]
filename = 'data-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
Question : How do I do to get all the data loaded in the page I want (the page I scrap is different than the page that gets the JSON data)