How to interact to Hbase via knox using Python?

859 Views Asked by At

I'm trying to interact with hbase throght knox using Python, In python The admin give a list of knox API endpoints for hive, hbase and spark like: https://knox-devl.www.mysite.com:9042/gateway/MYSITEHDO/hbaseversion/cluster

Now, since I'm using Python's happybase library, my connection code is

import happybase

connection=happybase.Connection('https://knox-devl.www.mysite.com/gateway/MYSITEHDO/hbaseversion/cluster',port=9042)
connection.open()
print(connection.tables())

The error it show is: thriftpy.transport.TTransportException: TTransportException(message="Could not connect to ('https://knox-devl.www.mysite.com/gateway/MYSITEHDO/hbaseversion/cluster', 9042)", type=1)

Also I tried with Phoenixdb lib

import phoenixdb

database_url = 'https://knox-devl.www.mysite.com:9042/gateway/MYSITEHDO/hbaseversion/cluster'
conn = phoenixdb.connect(database_url, autocommit=True)
cursor = conn.cursor()
cursor.execute("SHOW tables")

But I'm getting another error: phoenixdb.errors.InterfaceError: ('RPC request failed', None, None, BadStatusLine("''",)) Exception phoenixdb.errors.InterfaceError: InterfaceError('RPC request failed', None, None, BadStatusLine("''",)) in <bound method Connection.__del__ of <phoenixdb.connection.Connection object at 0x10bc97d90>> ignored

The only way I can get some of the data through curl:

curl -i -k -u guest:guest-password 'https://knox-devl.www.mysite.com:9042/gateway/MYSITEHDO/hbaseversion/cluster'

But there is no SQL commands there.

did anyone know how to do this or there something I'm missing here, like ask for a different URL or enable something on the cluster?

1

There are 1 best solutions below

2
On BEST ANSWER

As you identified, the only way to talk to HBase through Knox is via HBase's REST API. Happybase is trying to connect directly to HBase via RPC, which Knox will block.

You can't use Happybase from outside a cluster with Knox enabled.

A good tutorial for using the HBase REST API with Python can be found here. In case the link ever dies, some of the most useful commands from this article are:

  • Look at a table's schema:

    request = requests.get(baseurl + "/" + tablename + "/schema")
    
  • Insert a row:

    cellset = Element('CellSet')
    
    linenumber = 0;
    
    for line in shakespeare:      
        rowKey = username + "-" + filename + "-" + str(linenumber).zfill(6)
        rowKeyEncoded = base64.b64encode(rowKey)
    
        row = SubElement(cellset, 'Row', key=rowKeyEncoded)
    
        messageencoded = base64.b64encode(line.strip())
        linenumberencoded = encode(linenumber)
        usernameencoded = base64.b64encode(username)
    
        # Add bleet cell
        cell = SubElement(row, 'Cell', column=messagecolumnencoded)
        cell.text = messageencoded
    
        # Add username cell
        cell = SubElement(row, 'Cell', column=usernamecolumnencoded)
        cell.text = usernameencoded
    
        # Add Line Number cell
        cell = SubElement(row, 'Cell', column=linenumbercolumnencoded)
        cell.text = linenumberencoded
    
        linenumber = linenumber + 1
    
        # Submit XML to REST server
        request = requests.post(baseurl + "/" + tablename + "/fakerow", data=tostring(cellset), headers={"Content-Type" : "text/xml", "Accept" : "text/xml"})
    
  • Delete a table:

    request = requests.delete(baseurl + "/" + tablename + "/schema")