Convert PDF to text without pdftotext?

5.1k Views Asked by At

I have to convert PDFs to text and currently I am using pdftotext.exe. This messes up the resulting text sometimes and so I can't use that.

Is there another free tool that I can call from another program? I'd prefer a command line tool.

3

There are 3 best solutions below

2
On

PDF can be tricky to convert to Text depending on how its constructed, but you may get good results from iTextSharp or GhostScript or a commercial component eg: from www.tallcomponents.com (not affiliated)

0
On

PDF files do not generally contain any structure so the software needs to guess it. I wrote a blog post on the issues at http://www.jpedal.org/PDFblog/2009/04/pdf-text/

You could also try PdfBox.

0
On

I find that Apache PDFBox is much better than pdftotext. It extracts text in a way that is much closer to the original formatting of the document. It can be run from the command line.