Installing Gumbo and its Python wrapper

265 Views Asked by At

I am trying to use Gumbo's python wrapper to parse HTML.

My operating system is Ubuntu 14.04.3 LTS. I am using Python2.7

I have cloned the latest version of Gumbo from github.

I followed the installation steps provided on Github.

The installation of the libraries, both the C library and the python wrapper, appeared to be successful ( No error message, both printed successful messages at the end )

C library final message:

Libraries have been installed in: /usr/local/lib

Python wrapper message:

Installed /usr/local/lib/python2.7/dist-packages/gumbo-0.10.1-py2.7.egg Processing dependencies for gumbo==0.10.1 Finished processing dependencies for gumbo==0.10.1

The first problem I encountered was when I tried to open pydoc for gumbo, to better understand the library.

pydoc gumbo produced the following error:

problem in gumbo - <type 'exceptions.OSError'>: /usr/local/lib/python2.7/dist-packages/gumbo-0.10.1-py2.7.egg/gumbo/libgumbo.so: cannot open shared object file: No such file or directory

Searching for the message yielded a single result.

It was not of much use to me.

Looking at the dist-packages directory, I noticed that libgumbo.so was not in /usr/local/lib/python2.7/dist-packages/gumbo-0.10.1-py2.7.egg/gumbo/ . All other files ( soup-adapter.py, gumboc.py, etc ) where there however.

The installation of the C library placed libgumbo.so ( and some other libraries, like libgumbo.a libgumbo.la, etc ) in /usr/local/lib. So, as a work around, I created a simlink from .../dist-packages/gumbo-0.10.1-py2.7.egg/gumbo/ to /usr/local/lib.

This got pydoc gumbo to work.

I tried to import gumbo and soup-adapter in the interpreter after. I received the following error:

import soup_adapter
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "soup_adapter.py", line 26, in <module>
    import gumboc
  File "gumboc.py", line 44, in <module>
    os.path.dirname(__file__), _name_of_lib))
  File "/usr/lib/python2.7/ctypes/__init__.py", line 443, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python2.7/ctypes/__init__.py", line 365, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libgumbo.so: cannot open shared object file: No such file or directory

I am not sure how to proceed or how exactly to get gumbo to work.

0

There are 0 best solutions below