Connecting to hive in an MIT kerberos authenticated cloudera hadoop server from a server without kerberos

279 Views Asked by At

I want to connect to the hive service in an MIT kerberos authenticated cloudera hadoop server. I am using a Python script which is hosted on a windows server with no kerberos installed. I am using a conda environment with Python 3.9.7 and Pyhive 0.6.5. As the windows server does not have a kerberos client, I have copied the krb5.conf and keytab files from the cloudera server to my windows server, renamed krb5.conf to krb5.ini, and added their paths to environment variable

from pyhive import hive
import os
os.environ['KRB5_CONFIG'] = 'PATH\TO\krb5.ini'
os.environ['KRB5_CLIENT_KTNAME'] = 'PATH\TO\hive.service.keytab'

conn = hive.Connection(host="some-ip-address", port=4202, auth='KERBEROS', kerberos_service_name='hive')

It failed to connect. Below is the error message

(myenv) C:\Users\myname\Desktop>python hivetest.py
Traceback (most recent call last):
  File "C:\Users\myname\Desktop\hivetest.py", line 34, in <module>
    hiveconn=hive.Connection(host="some-ip-address",port=4202, auth='KERBEROS', kerberos_service_name='hive')
  File "C:\Users\myname\AppData\Local\conda\conda\envs\myenv\lib\site-packages\pyhive\hive.py", line 243, in __init__
    self._transport.open()
  File "C:\Users\myname\AppData\Local\conda\conda\envs\myenv\lib\site-packages\thrift_sasl\__init__.py", line 84, in open
    raise TTransportException(type=TTransportException.NOT_OPEN,
thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'

When the hadoop server was not kerberos authenticated, I was able to connect to the hive service with this line

conn = hive.Connection(host="ip-address", port=4202, username="some-user", auth="NONE")

I removed the lines to add the paths to environment variables just to check whether it would result in a new error message, but the error was the same as shared above.

Pyhive's connection class uses a lot of parameters for initialization, one of them is a configuration parameter. I tried the configs like these, but none of them worked and failed with the same error message.

config1={
    'hive.metastore.client.principal':'[email protected]',
    'hive.metastore.sasl.enabled': 'true',
    'hive.metastore.client.keytab': 'PATH\\TO\\keytab',
}
hiveconn=hive.Connection(host="some-ip",port=4202, auth='KERBEROS', kerberos_service_name='hive', configuration=config1)

config2={
    'hive.server2.authentication.kerberos.principal':'[email protected]',
    'hive.server2.authentication.kerberos.keytab': 'PATH\\TO\\keytab',
}
hiveconn=hive.Connection(host="some-ip",port=4202, auth='KERBEROS', kerberos_service_name='hive', configuration=config2)

Am I doing something wrong with the prerequisites to make this connection irrespective of Python library? Is it mandatory to install a kerberos client on a server before making connection to another kerberos authenticated hadoop server?

1

There are 1 best solutions below

3
uds0128 On

I have not used pyhive library. But had some experiences of Kerberos. You mentioned that the windows machine hosting python script don't have Kerberos client. According to my understanding you need to have one. In general most of the libraries which uses Kerberos don't have capabilities to acquire Kerberos ticket from KDC. It only have capability to use the acquired session tickets. In case this libraries acquire also they will use Kerberos client APIs to use platforms Kerberos client to acquire ticket.

In this case, You have to install Kerberos client in Windows Machine. Modify your krb.ini file to contact remote KDC. Make sure that you can acquire Kerberos ticket using Kerberos client not python script. Once you are able to acquire the ticket using Kerberos client then you can go with python script. It should work.