Converting high volume of .pdf's into .html or .doc

49 Views Asked by At

I'm looking for either a code snippet or other solution capable of converting a high volume (thousands) of .pdf's into .html or .doc while at the same time:

  • maintaining hierarchical structure of headings
  • capturing images in the document, uploading them to an image server and creating an absolute link to it, and maintaining table formatting.

Does such a tool exist and if so, who makes it? If not, who are some of the thought leaders in the space that I can connect with?

1

There are 1 best solutions below

0
On

Check pdftohtml

You can then add some scripting around it to do a batch conversion.

The results aren’t that great, though.