Apify - How to scrape multiple pages (request queue) with a dynamic "next page" button?

1.6k Views Asked by At

I'm setting up a new web scraper using Apify to scrape a page with pagination. Usually, I'd use the use the request queue, Link Selector, Pseudo-URL method. However the page I'm trying to scrape has dynamic "next page" buttons and the link is triggered via a javascript function.

What would be the best way to tell Apify's web scraper to go to the next page?

Any way to simulate a manual click on the button?

Or to use the number sequence at the end of the URL (www.domain.com/discover/recent?page=2)?

2

There are 2 best solutions below

0
On

Looking at this particular website - it looks like every next page (as you already mentioned) has ?page=<i> in url, therefore you could just enqueue next page in the end of page function by using context.enqueueRequest().

Another option is to use cheerio crawler with xhr links, which would look like this one: https://webflow.com/api/discover/sites/recent?limit=12&offset=0&sort=&cloneable=false&tag=, where offset would be 0, 12, 24, etc. This way you'll get an array of structured jsons in response (which represent 12 items loaded on page), and also you would save some Compute Units as you don't really need the browser.

Hope this helps!

0
On

If you can, use the URL. It will allow you to split concerns for each page and it is the idiomatic way to use Apify. You can enqueue any URL using await context.enqueueRequest({ url })

If there is not URL for the page, then you have to click through it all inside a single page function.