How to download pdf file that opens in new tab with puppeteer?

2.4k Views Asked by At

I am trying to download invoice from website using puppeteer, I just started to learn puppeteer. I am using node to create and execute the code. I have managed to login and navigate to the invoice page, but it opens in new tab, so, code is not detecting it since its not the active tab. This is the code I used:

const puppeteer = require('puppeteer')

const SECRET_EMAIL = 'emailid'
const SECRET_PASSWORD = 'password'

const main = async () => {
  const browser = await puppeteer.launch({
    headless: false,
  })
  const page = await browser.newPage()
  await page.goto('https://my.apify.com/sign-in', { waitUntil: 'networkidle2' })
  await page.waitForSelector('div.sign_shared__SignForm-sc-1jf30gt-2.kFKpB')
  await page.type('input#email', SECRET_EMAIL)
  await page.type('input#password', SECRET_PASSWORD)
  await page.click('input[type="submit"]')
  await page.waitForSelector('#logged-user')
  await page.goto('https://my.apify.com/billing#/invoices', { waitUntil: 'networkidle2' })
  await page.waitForSelector('#reactive-table-1')
  await page.click('#reactive-table-1 > tbody > tr:nth-child(1) > td.number > a')
  const newPagePromise = new Promise(x => browser.once('targetcreated', target => x(target.page())))
  const page2 = await newPagePromise
  await page2.bringToFront()
  await page2.screenshot({ path: 'apify1.png' })
  //await browser.close()
}

main()

In the above code I am just trying to take screenshot. Can anyone help me?

1

There are 1 best solutions below

3
On

Here is an example of a work-around for the chromium issue mentioned in the comments above. Adapt to fit your specific needs and use-case. Basically, you need to capture the new page (target) and then do whatever you need to do to download the file, possibly pass it as a buffer to Node as per the example below if no other means work for you (including a direct request to the download location via fetch or ideally some request library on the back-end)

const [PDF_page] = await Promise.all([
    browser
        .waitForTarget(target => target.url().includes('my.apify.com/account/invoices/' && target).then(target => target.page()),
    ATT_page.click('#reactive-table-1 > tbody > tr:nth-child(1) > td.number > a'),
]);

const asyncRes = PDF_page.waitForResponse(response =>
    response
        .request()
        .url()
        .includes('my.apify.com/account/invoices'));

await PDF_page.reload();
const res = await asyncRes;
const url = res.url();
const headers = res.headers();

if (!headers['content-type'].includes('application/pdf')) {
    await PDF_page.close();
    return null;
}

const options = {
    // target request options
};

const pdfAb = await PDF_page.evaluate(
    async (url, options) => {
        function bufferToBase64(buffer) {
            return btoa(
                new Uint8Array(buffer).reduce((data, byte) => {
                    return data + String.fromCharCode(byte);
                }, ''),
            );
        }

        return await fetch(url, options)
            .then(response => response.arrayBuffer())
            .then(arrayBuffer => bufferToBase64(arrayBuffer));
    },
    url,
    options,
);

const pdf = Buffer.from(pdfAb, 'base64');
await PDF_page.close();