I have a data table in Postgres containing a column age
that contains people ages. Unfortunately this data is dirty, so it includes string entries, which might represent ranges, featuring all of these formats:
21
25-30
30-35 years
Using Apache Superset I am making a chart from this data. I can apply functions on this column, so for example I am using this expression to get the lower bound of ranges split_part(split_part("age",' ',1),'-', 1)::int
.
However, the lower bound is not really representative of the data. I'd like when I have a range to get the average. I've tried:
avg(unnest(string_to_array(split_part("age",' ',1),'-')::int[]))
But I get:
aggregate function calls cannot contain set-returning function calls
LINE 1: SELECT avg(unnest(string_to_array(split_part("ageValue",' ',...
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
This is available as a fiddle.
I've seen suggestions of how this can be done with a select or function, but I cannot use any of these in the Superset X-axis query. Can it be done with expressions only and if so - how?
Obviously, the silver bullet would be to clean up your data.
For a quick fix with a single expression:
With aggregation:
fiddle
string_to_table()
was added with Postgres 14.