I have been trying to install Scraperwiki module for Python. However, it generates the error:
""UserWarning: Local Scraperlibs requires pdftohtml, but pdftohtml was not found in the PATH. You probably need to install it".
I looked into poppler as they have pdftohtml file but I don't know how it works - whether there is a python library I need to install or a .exe file. And how do I go about installing it. Running on Windows.
Many Thanks
If you're not intending to use
scraperwiki.pdftoxml(), then the warning doesn't apply. It doesn't stop you from installing thescraperwikipackage, however.Also, that function doesn't work on Windows at all as is; it uses
NamedTemporaryFileswhich behave differently on Windows to Linux.If you do want to use that function, the simplest way to get an up-to-date version of
pdftohtmlon Windows is to download Calibre Portable. (The version on Sourceforge is older.)Install it anywhere; you just need a few files from it. From where you installed it, from the folder containing calibre.exe, you need
pdftohtml.exeinto your working folder as well as, from theDLLsfolder in the Calibre install,freetype.dll,jpeg.dll,libpng12.dll,zlib1.dll.You'll also need code based on
scraperwiki.pdftoxml()instead, like:(I was trying to get this working for a user in Windows recently; I'll keep the gist containing this code updated.)