Detect blank page in PDF in NodeJS using pdf-lib

447 Views Asked by At

I spent the day trying to find a way on my NodeJS server to detect whenever the last page of a generated pdf is blank (to remove it). I just can't seem to find a way to do so. I tried to use pdf-lib and many others along the way, with no success. Currently, I'm running the following function :

const docx_buffers_to_pdf_buffer = async (docx_buffers) => {
    const combined_pdf_buffer = await PDFDocument.create()
    for(const docx_buffer of docx_buffers){
        const pdf_buffer = await PDFDocument.load(await libre_convert(docx_buffer, "pdf", undefined))
        const pages = await combined_pdf_buffer.copyPages(pdf_buffer, pdf_buffer.getPageIndices())
        const n = Math.max(1, pages.length - 1)
        for(let page_index = 0; page_index < n; page_index++){
            combined_pdf_buffer.addPage(pages[page_index])
        }
        if(pages.length > 1){
            let temp_pdf_buffer = await PDFDocument.create()
            const temp_page = (await temp_pdf_buffer.copyPages(pdf_buffer, [pages.length - 1]))[0]
            temp_pdf_buffer.addPage(temp_page)
            temp_pdf_buffer = await temp_pdf_buffer.save()
            if(await page_is_not_empty(temp_pdf_buffer)){
                combined_pdf_buffer.addPage(pages[pages.length - 1])
            }
        }
    }
    return await combined_pdf_buffer.save()
}

The logic works fine and the pdf document is properly generated, but I don't know how to write the function page_is_not_empty, right now it just returns true to include the last page.

I had the idea to convert the last page into an image and manually check if it is blank to return true if it is not in order to not include the page in the final document.

I'm running out of ideas, I didn't know checking for a blank page would be so hard, or maybe I'm missing key point...

I even tried to dive into page.node with no successs, or evenpage.getContentStream().operators (this last one contains an empty array no matter if the page is really empty or not, which is really confusing to me).

Even more weird, the documention doesn't seem to be up-to-date because I was able to use getContentStream() whereas this function is not even referenced there...

0

There are 0 best solutions below