I am trying to use Gumbo's python wrapper to parse HTML.
My operating system is Ubuntu 14.04.3 LTS. I am using Python2.7
I have cloned the latest version of Gumbo from github.
I followed the installation steps provided on Github.
The installation of the libraries, both the C library and the python wrapper, appeared to be successful ( No error message, both printed successful messages at the end )
C library final message:
Libraries have been installed in: /usr/local/lib
Python wrapper message:
Installed /usr/local/lib/python2.7/dist-packages/gumbo-0.10.1-py2.7.egg Processing dependencies for gumbo==0.10.1 Finished processing dependencies for gumbo==0.10.1
The first problem I encountered was when I tried to open pydoc for gumbo, to better understand the library.
pydoc gumbo
produced the following error:
problem in gumbo - <type 'exceptions.OSError'>: /usr/local/lib/python2.7/dist-packages/gumbo-0.10.1-py2.7.egg/gumbo/libgumbo.so: cannot open shared object file: No such file or directory
Searching for the message yielded a single result.
It was not of much use to me.
Looking at the dist-packages directory, I noticed that libgumbo.so was not in /usr/local/lib/python2.7/dist-packages/gumbo-0.10.1-py2.7.egg/gumbo/
. All other files ( soup-adapter.py, gumboc.py, etc ) where there however.
The installation of the C library placed libgumbo.so ( and some other libraries, like libgumbo.a libgumbo.la, etc ) in /usr/local/lib
. So, as a work around, I created a simlink from .../dist-packages/gumbo-0.10.1-py2.7.egg/gumbo/
to /usr/local/lib
.
This got pydoc gumbo to work.
I tried to import gumbo and soup-adapter in the interpreter after. I received the following error:
import soup_adapter
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "soup_adapter.py", line 26, in <module>
import gumboc
File "gumboc.py", line 44, in <module>
os.path.dirname(__file__), _name_of_lib))
File "/usr/lib/python2.7/ctypes/__init__.py", line 443, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python2.7/ctypes/__init__.py", line 365, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libgumbo.so: cannot open shared object file: No such file or directory
I am not sure how to proceed or how exactly to get gumbo to work.