I have a query that filters the emails of people who "have a quantity" bigger than the median over all quantities found in the table:
SELECT
AVG(quantity) AS avg_quantity,
email
FROM table
GROUP BY email
HAVING AVG(quantity) > (SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY quantity) FROM table)
The result is:
123.12 [email protected]
0.5 [email protected]
In addition to this, I would like to add a column which will have for all rows the same value, that of the median calculated in the HAVING clause above. If we suppose this median to be equal to 0.01, then I would like to have the following result:
123.12 [email protected] 0.01
0.5 [email protected] 0.01
I tried to doing a Cartesian product:
WITH (
SELECT
AVG(quantity) AS avg_quantity,
email
FROM table
GROUP BY email
HAVING AVG(quantity) > (SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY quantity) FROM table)
) AS tmp
SELECT tmp.*, PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY table.quantity) from tmp, table
But I get this error:
ERROR: column "tmp. avg_quantity" must appear in the GROUP BY clause or be used in an aggregate function
How can I achieve the above expected output with 3 columns?
You can use it as an uncorrelated scalar subquery right in the
selectlist. Demo at db<>fiddle:You're already using the query this way, just elsewhere. You can actually re-use it for both with the CTE you already tried out:
Your idea with a Cartesian product would work too. You're computing one median for the whole table, so the same single result gets glued to every row. It has to be grouped by, but being identical throughout, it does not affect the current grouping.