I have to convert PDFs to text and currently I am using pdftotext.exe
. This messes up the resulting text sometimes and so I can't use that.
Is there another free tool that I can call from another program? I'd prefer a command line tool.
PDF files do not generally contain any structure so the software needs to guess it. I wrote a blog post on the issues at http://www.jpedal.org/PDFblog/2009/04/pdf-text/
You could also try PdfBox.
I find that Apache PDFBox is much better than pdftotext. It extracts text in a way that is much closer to the original formatting of the document. It can be run from the command line.
PDF can be tricky to convert to Text depending on how its constructed, but you may get good results from iTextSharp or GhostScript or a commercial component eg: from www.tallcomponents.com (not affiliated)