I am using Scrapy to scrap a website that lists some specific data, but there is a button Show More that needs to be clicked many times until the data are all complete.

The URL of the page : www.websiteiamscraping.com/data/sheet/historical?s=MSA:CAS

When I click on the button Show More, here's the URL that's sent : www.websiteiamscraping.com/data/ajax/getmorehistoricalsheets?StartDate=42598&s=MSA%3ACAS&isLRS=false (the StartDate parameter changes every time I click the button)

and that returns the additional data in the form of HTML in a JSON object, but the HTML data seems to be mixed with ASCII symbols like : 55.21k\u003c/span\u003e

My code is as the following :

class DataSpider(scrapy.Spider):
    name = "data"

    start_urls = [
        'www.websiteiamscraping.com/data/sheet/historical?s=MSA:CAS'
    ]

    def parse(self, response):
        page = response.url.split("=")[1].split(":")[0]
        filename = 'data-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)

Question : How do I do to get all the data loaded in the page I want (the page I scrap is different than the page that gets the JSON data)

0

There are 0 best solutions below