pypdf: arrange pages of different pdfs in a single page as a grid

95 Views Asked by At

I have several pdf files 1.pdf, 2.pdf, ..., n.pdf, each with 10 pages, each page being the same size.

I want to create another file summary.pdf

  • containing only one page, with all the pages of all the PDFs; and
  • such page should look like a kind of grid: its first row must contain all the 10 pages of 1.pdf, its second row to contain all the 10 pages of 2.pdf, etc.

Is this doable with pypdf or another pure Python package?

3

There are 3 best solutions below

3
mnekkach On

You can try this code to make a test. You need to install PyPDF2 with : pip install PyPDF2

import PyPDF2

def merge_pdfs(input_files, output_file):
    pdf_writer = PyPDF2.PdfWriter()

    for input_file in input_files:
        pdf_reader = PyPDF2.PdfReader(input_file)

        for page_number in range(pdf_reader.numPages):
            page = pdf_reader.getPage(page_number)
            pdf_writer.addPage(page)

    with open(output_file, 'wb') as output_pdf:
        pdf_writer.write(output_pdf)

input_files = ['1.pdf', '2.pdf', '3.pdf']
output_file = 'summary.pdf'
    
merge_pdfs(input_files, output_file)
2
K J On

There are multiple factors you need to consider before setting out such a grid.

Firstly assuming you have standard page of say 840 units there is an upper boundary of n pages (units cannot exceed 14400 without some adjustments in the PDF structure).

Thus for standard pages n cannot exceed 17 rows.

Secondly, the more contents the heavier a drain on available working memory. Thus the more data the less for programming efficiently. So for that reason alone it can sometimes help to chunk a task into parts then assemble those parts yet again.

Thus limit your 1st attempt to a reasonable number say 5 or 6 rows of 10.

Normally we might use a grid of 4 x 4 without a problem. Then loop that again to make 16 x 16 which is likely as much as practical.

I could attempt the whole task in a few lines of shell command , however you have limited your example to Python. Thus I suggest you follow the examples but scale them up.

PyPDF https://pypdf.readthedocs.io/en/stable/user/cropping-and-transforming.html#transforming-several-copies-of-the-same-page this is similar to the older pyPDF2 (depreciated) example at https://stackoverflow.com/a/72754236/10802527

Alternately try PyMuPDF https://stackoverflow.com/a/75705016/10802527

Many other Python wrappers may simply call on GhostScript and that is just 2 lines to do the whole job.

GS\10021\bin>gs -q -sDEVICE=pdfwrite  -o Allx9Pages.pdf -f 1x9.pdf 2x9.pdf 3x9.pdf 4x9.pdf 5x9.pdf 6x9.pdf 7x9.pdf 8x9.pdf 9x9.pdf

GS\10021\bin>gs -q -sDEVICE=pdfwrite -dDEVICEWIDTHPOINTS=5508 -dDEVICEHEIGHTPOINTS=7128 -o 9x9-Pages.pdf -sNupControl=9x9 -f Allx9Pages.pdf

However you may need to make the controls more specific to your case as here I simply did

9 x 9. Famous Band with John Dummer

enter image description here

0
Tom M. Ragonneau On

I ended doing it using the transformations of pypdf. I put my code below for future references.

from pypdf import PdfReader, PdfWriter, Transformation


n = 3  # number of PDF files
pdf_width = 415
pdf_height = 321

output = PdfWriter()
pdf_summary.add_blank_page(pdf_width * 10, pdf_height * n)
for i_pdf in range(n):
    source = PdfReader(f'{i_pdf + 1}'.pdf)
    for i_page, page in enumerate(pdf_source.pages):
        output.pages[0].merge_transformed_page(
            page,
            Transformation().translate(i_page * pdf_width, (n - i_pdf - 1) * pdf_height),
        )
output.write('output.pdf')
output.close()