is there a way to measure margins of a pdf using python?

437 Views Asked by Yehuda At 26 January 2023 at 03:23

I've been using different python packages to parse PDFs, but I'm wondering if it's possible to measure the margins of a particular line in the document. The measurement I would like is for it to be in pixels css-style, if possible.

It doesn't need to be so specific, just to figure out if a line is left-aligned, centered, or right-aligned based on margins, starting from left-to-right.

Example:

# margin <= x
left-aligned

# margin >= y && margin <= z
                            center-aligened

# margin >= z
                                                              right-aligned

Obviously this is just an example, but the margin differential will not be large, meaning, PDFs I'm parsing will likely have (in css terms):

margin-left: 0
margin-left: x
margin-left: y

x, y actual value are unimportant, the important thing is that they'll be consistent.

Sorry if this is confusing, the main thing I'm asking for is clarification or help in figuring out left-margin for every line in a pdf.

Original Q&A

There are 1 best solutions below

Joris Schellekens On 27 January 2023 at 08:54

disclaimer: I am the author of borb, the library used in this answer

You can SimpleLineOfTextExtraction in borb, which returns the lines of text in a PDF.

You can check out this class here.

Each line has a content box (and a layout box), which can give you information about the location of that particular line of text.

You can use this to determine whether a line is left/right/middle aligned by comparing it to lines above/below it.

You can find an example of how to use this class here.

Essentially you open a document using the PDF.loads method, passing along an EventListener.

is there a way to measure margins of a pdf using python?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PDF

Related Questions in PYTHON-PDFREADER

Trending Questions

Popular # Hahtags

Popular Questions