How to get results of multiple aggregations in a single druid query?

925 Views Asked by At

Say I have the following table named t_student_details:

 Name    Age    Marks     Sport       City     ........ (multiple columns)
======  =====  =======   =======     ======
Jason    11      45       tennis     New York
Mark     12      42       football   New York
Jessica  11      43       tennis     Orlando
Brad     13      46       tennis     Orlando
  .
  .
  .
(multiple rows)

I want to get certain information about the students in a single query. This is what I would do in Postgres:

WITH sports_filter AS(
    SELECT * FROM t_student_details WHERE sport='tennis'
)
SELECT JSON_BUILD_OBJECT('max_age', (SELECT MAX(age) FROM sports_filter),
                         'min_age', (SELECT MIN(age) FROM sports_filter),
                         'city_wise_marks_mean', (SELECT JSON_AGG(mean_items)
                                                 FROM (SELECT city, AVG(marks) FROM sports_filter
                                                       GROUP BY city) AS mean_items)

The result of the above SQL query in Postgres would be

{"max_age": 46,
 "min_age": 43,
 "city_wise_marks_mean": [{"New York": 45, "Orlando": 44.5}]}     

As it is evident, I got multiple aggregations/information about students belonging to the sport 'tennis' in a single query. This way of querying also fetches only the necessary data and not everything.

How do I achieve this using Druid? I don't necessarily need the output response to be in the exact same format, but how do I fetch all these stats in the same query without having to fetch all the details of the students? Is it possible to get all this in a single query using Apache Druid?

2

There are 2 best solutions below

0
On

As I understand your Postgres query it should be as simple as using GROUPING SETS

WITH sports_filter AS(
    SELECT * FROM t_student_details WHERE sport='tennis'
)
SELECT City, MIN(Marks), MAX(Marks), AVG(Marks)
FROM sports_filters
GROUP BY CUBE (City)

The GROUP BY CUBE is short-hand for GROUP BY GROUPING SETS ( (City), () )

This should result in one row where City is NULL which is your overall min/max/avg and one row for each city.

0
On

Yes, it should be possible. Druid supports two query languages: Druid SQL and native queries. SELECT queries with the following structure are supported:

[ EXPLAIN PLAN FOR ]
[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]
SELECT [ ALL | DISTINCT ] { * | exprs }
FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON condition }
[ WHERE expr ]
[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ]
[ HAVING expr ]
[ ORDER BY expr [ ASC | DESC ], expr [ ASC | DESC ], ... ]
[ LIMIT limit ]
[ OFFSET offset ]
[ UNION ALL <another query> ]

along with these Aggregation functions.