I encounter with an Error when I try to reach Apache Druid datasource. Here is the code sample I used to load Druid datasource into pandas dataframe:
from pydruid.client import *
from pydruid.utils.aggregators import doublesum
from pydruid.utils.filters import Dimension
query = PyDruid("http://10.XXX.XX.XXX:8082/", 'druid/v2/sql')
ts = query.timeseries(
datasource='mallCustomers',
granularity='all',
intervals='2015-01-01/pt1h',
aggregations={"count": doublesum("Age")},
filter=Dimension('Age') == 18
)
df = query.export_pandas()
- 10.XXX.XX.XXX is my Query node
- pydurid==0.6.5
I also tried port 8083, but it also throws urllib.error.URLError
How can I reach druid table via Python ? I can't find robust guideline or tutorial for this, and the ones I found throw similar errors.
Can you help me on this one ?
Here is the full error:
OSError: HTTP Error 400: Bad Request
Druid Error: b'{"error":"Cannot construct instance of `org.apache.druid.sql.http.SqlQuery`, problem: query\\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 255]"}'
Query is: {
"aggregations": [
{
"fieldName": "Age",
"name": "count",
"type": "doubleSum"
}
],
"dataSource": "mallCustomers",
"filter": {
"dimension": "Age",
"type": "selector",
"value": 18
},
"granularity": "all",
"intervals": "2015-01-01/pt1h",
"queryType": "timeseries"
}
Process finished with exit code 1