Is it possible to put the captions generated by AI models back into the pdf file?

60 Views Asked by At

I have a pdf that contains multiple pages where each page consists of texts and/or images. I have found ways to extract images from a pdf file and I have found ways to use AI models to generate captions for images. But is it possible to put back the captions generated by the AI model to the corresponding image in the pdf file? If it is possible, then what library should I use? Or does anyone know how to code it?

1

There are 1 best solutions below

4
Jorj McKie On

You can use PyMuPDF for writing text to PDF pages ... in multiple ways.

Note: I am a maintainer and the original creator of PyMuPDF.

You need to locate the image position on the page first. Then decide about a rectangle (like above or below the image boundary box) to receive the caption text.

For example, assume the image boundary box is called bbox, then define rect = (bbox.x0, bbox.y1, bbox.x1, bbox.y1 + 20). This is a rectangle below the image with the same width as bbox and a height of 20.

Then do page.insert_htmlbox(rect, caption) using the caption text.

That method also allows you to align (e.g. center) the caption text via HTML styling instructions, like page.insert_htmlbox(rect, caption, css="* {text-align: center;}").