How to run Druid query in Python (Druid-Python connection)?

516 Views Asked by At

I want to run some druid queries in Python. Can someone please tell me how to do that?

I did try with localhost druid and it works but it doesn't work when I use the production instance of druids which is hosted in the cloud.

Here is what I did

from druid_query.client import Client
from druid_query.queries import Sql

client = Client(sql_endpoint='http://druid_url.vpc/console.html#query')

query = Sql('SELECT * FROM table1 LIMIT 5')

result = client.execute(query)

print(result)
   

and the error that I get is as below

Traceback (most recent call last):
  File "/Users/pankaj/pani/scripts/druid.py", line 8, in <module>
    result = client.execute(query)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/druid_query/client.py", line 50, in execute
    return process_response(resp, query)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/druid_query/client.py", line 94, in process_response
    return resp.json()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/httpx/_models.py", line 742, in json
    return jsonlib.loads(self.content.decode(encoding), **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I also tried the broker API (via cURL) to access data from the druid query but again it only works for localhost.

FYI - I am trying to run this Python script on an AWS machine and druid is installed on some other machine. I am unable to connect my Python script with Druid. Normally I use vpn to work in druid

1

There are 1 best solutions below

0
On

You should query the API endpoint and not the UI html page.

The API should be something like this: http://druid-host:druid-port/druid/v2/sql

Where druid-host is the exposed hostname of Druid cluster. And druid-port is the exposed post.

See this documentation from Apache.

I don't know these packages you are using "druid_query" but query durid is very simple all you need to do is http post to the above endpoint with the query as the post payload.

From documentation:

import json
import requests

# Make sure you replace `your-instance`, and `port` with the values for your deployment.
url = "https://<your-instance>:<port>/druid/v2/sql/task/"

payload = json.dumps({
    "query": "INSERT INTO wikipedia\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n    '{\"type\": \"json\"}',\n    '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\n  )\n)\nPARTITIONED BY DAY",
    "context": {
        "maxNumTasks": 3
    }
})
headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, data=payload, auth=('USER', 'PASSWORD'))

print(response)