How to run presto queries in python using pyhive?

6.5k Views Asked by At

I am trying to run presto query in python using pyhive library but max retries error is coming. I am running it in jupyter notebook locally(laptop). I think its not able to connect to presto node. I am using Azure hdinsight cluster and installed presto application on head node(using starburst distribution). I have used cluster user name and password and also i have tried head node ssh user and password but nothing is working. Below is my code:

from pyhive import presto
conn= presto.connect(
    host='clustername-ssh.azurehdinsight.net',
    port=8085,
    username='sshuser'
    password='sshpassword',
    protocol='https'
    ).cursor()
conn.execute('SELECT * FROM hive.default.parquettest limit 1')

The error i am getting is:

ConnectionError: HTTPConnectionPool(host='sm-hdinsight01-ssh.azurehdinsight.net', port=8085): Max retries exceeded with url: /v1/statement (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 110] Connection timed out',))

But when i am running it in terminal of head node it works:

from pyhive import presto
conn= presto.connect(
    host='localhost',
    port=8085).cursor()
conn.execute('SELECT * FROM hive.default.parquettest limit 1')

I think i am missing some crucial thing here. please help.

1

There are 1 best solutions below

0
On

sounds like an permission/authentification problem. i am currently using a jupyter notebook on my local machine t query the company presto cluster like this using the prestodb library.

so basically:

import prestodb

conn=prestodb.dbapi.connect(
    host='presto.bar.foo.com',
    port=80, user='foo',
    password='bar'
    catalog='hive',
    schema='default',
)

cur = conn.cursor()
cur.execute(
'SELECT * FROM "schema"."db" limit 10')
records = cur.fetchall()
print(records[0])