I am trying to run presto query in python using pyhive library but max retries error is coming. I am running it in jupyter notebook locally(laptop). I think its not able to connect to presto node. I am using Azure hdinsight cluster and installed presto application on head node(using starburst distribution). I have used cluster user name and password and also i have tried head node ssh user and password but nothing is working. Below is my code:
from pyhive import presto
conn= presto.connect(
host='clustername-ssh.azurehdinsight.net',
port=8085,
username='sshuser'
password='sshpassword',
protocol='https'
).cursor()
conn.execute('SELECT * FROM hive.default.parquettest limit 1')
The error i am getting is:
ConnectionError: HTTPConnectionPool(host='sm-hdinsight01-ssh.azurehdinsight.net', port=8085): Max retries exceeded with url: /v1/statement (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 110] Connection timed out',))
But when i am running it in terminal of head node it works:
from pyhive import presto
conn= presto.connect(
host='localhost',
port=8085).cursor()
conn.execute('SELECT * FROM hive.default.parquettest limit 1')
I think i am missing some crucial thing here. please help.
sounds like an permission/authentification problem. i am currently using a jupyter notebook on my local machine t query the company presto cluster like this using the
prestodb
library.so basically: