I need to build a python script that aims to scrape a web page to retrieve a number in a "Show More" button.
This number will then be used as a parameter to request a URL that will return a JSON that contains data + a number. This last number will be used as a parameter to request the URL that will return a JSON that contains data + a number, etc.. The process goes on until the JSON return empty data + a number. When the data is empty, the scraper should stop.
I used Scrapy, but this doesn't work. Scrapy is asynchronous and based on my case, I need to wait for the first JSON result to give me the next information so I can scrape the second URL, and so on.
What do you suggest me to use as a Python library ? I have read that Selenium does the job but it is much more slower than Scrapy.
Scrapy's asynchronous behaviour is best seen when you have multiple URLs to scrape at a given time. In this case you would be enqueuing new requests only after parsing the previous one, so it shouldn't be a problem.
I don't know the exact structure of your JSON response, so let's assume you have two keys,
data
andnumber
. You could write a Scrapy spider with a parsing method similar to this::