How to wait for specific AJAX request in Puppeteer crawler

93 Views Asked by At

I need to fetch the data from ajax request made to graphQL. Pages are crawled by PuppeteerCrawler:

const crawler = new Apify.PuppeteerCrawler({
  preNavigationHooks: [
    async ({ page }): Promise<void> => {
      page.on('response', async (response) => {
        if (!response.request().url().includes(GRAPHQL_PATH)) return;
        const data = await response.json();
        console.log(data[0]?.data?.search?.advert?.id);
      });
    },
  ],
  handlePageFunction: async ({ request, page }): Promise<void> => {
    switch (request.label) {
      case LABEL_LIST:
        return listHandler(page);
      case LABEL_VIEW:
        // solution 1: await page.waitForNavigation({ waitUntil: 'networkidle0' });

        // solution 2: await page.waitForResponse((response) => response.url().includes(GRAPHQL_PATH));

        return await viewHandler(page);
      default:
        break;
    }
  },
});

crawler.run();

if I run the code as is, the callback in page.on('response', is never gets executed. So I tried to wait for a full page load as in solution 1, which partially worked, but it would loop and throw some errors. In the solution 2, I tried to wait only for an ajax request that I need (to graphql) but this snippet does not work at all.

If it helps, then the listHandler is basically fetches some links on the page and adds them to the queue:

await requestQueue.addRequest({ url: scrappedUrl });
1

There are 1 best solutions below

0
On

I was able to resolve the issue. Listening for a response in the preNavigationHook is redundant. waitForResponse already provides a http response. Final code:

const crawler = new Apify.PuppeteerCrawler({
  handlePageFunction: async ({ request, page }): Promise<void> => {
    switch (request.label) {
      case LABEL_LIST:
        return listHandler(page);
      case LABEL_VIEW:
        res = await page.waitForResponse((response) => response.url().includes(GRAPHQL_PATH));

        data = await res.json();

        return await viewHandler(page);
      default:
        break;
    }
  },
});

crawler.run();