Are there any models to extract specific data from pdf files?

349 Views Asked by whosevil13 At 29 June 2025 at 07:49

For the purpose of my project, I am given large pdfs and need to manually extract one specific value (commission). I am looking for ay machine learning or AI model that would be able to automate this process. The structure of the pdfs vary, so ideally the model would be able to scan the pdf and return the commission percent for any type of pdf. For example the value can be provided in such ways:

Commission Rate = 20%
The commission rate for this transaction is 20%.
Premium Commission Net

50000 20% 40000

Original Q&A

There are 1 best solutions below

Virgaux Pierre On 28 June 2022 at 20:06

I think your case is quite specific and you will be hard pressed to find a model that does exactly what you want without prior work. In my opinion you should perform the following tasks:

Annotate a representative sample of your dataset with different forms of pdf.
Use successively an OCR for example pytesseract and then regexes to locate the desired information. Test this technique with a portion of the annotated set.
Finally, test on the rest of the annotated data to evaluate your model.

Are there any models to extract specific data from pdf files?

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in ARTIFICIAL-INTELLIGENCE

Related Questions in TEXTDECODER

Trending Questions

Popular # Hahtags

Popular Questions