How to segment text in PDF files to get out some headings

55 Views Asked by USMAN SIDDIQUI At 14 October 2023 at 20:53

Lets say that i have a couple hundred of PDF file from which i have to extract each heading and the relevant text, for further processing for each heading how do I do that keeping the format of the file. I have tried PyPDF2 and the pdfminer libraries. These libraries are good in extracting text but I need to get headings and text out separately. One way could be converting the file to XML maybe that will get out the headings?

As mentioned I have tried PyPDF2 and pdfminer these are good and extract out all the text in my case but still I cannot get out heading etc for building some context.

Original Q&A

How to segment text in PDF files to get out some headings

There are 0 best solutions below

Related Questions in PDF

Related Questions in TEXT

Related Questions in NLP

Related Questions in EXTRACT

Related Questions in INFORMATION-EXTRACTION

Trending Questions

Popular # Hahtags

Popular Questions