Take average from a delimited string in a single expression

114 Views Asked by At

I have a data table in Postgres containing a column age that contains people ages. Unfortunately this data is dirty, so it includes string entries, which might represent ranges, featuring all of these formats:

21
25-30
30-35 years

Using Apache Superset I am making a chart from this data. I can apply functions on this column, so for example I am using this expression to get the lower bound of ranges split_part(split_part("age",' ',1),'-', 1)::int.

However, the lower bound is not really representative of the data. I'd like when I have a range to get the average. I've tried:

avg(unnest(string_to_array(split_part("age",' ',1),'-')::int[]))

But I get:

aggregate function calls cannot contain set-returning function calls
LINE 1: SELECT avg(unnest(string_to_array(split_part("ageValue",' ',...
               ^
HINT:  You might be able to move the set-returning function into a LATERAL FROM item.

This is available as a fiddle.

I've seen suggestions of how this can be done with a select or function, but I cannot use any of these in the Superset X-axis query. Can it be done with expressions only and if so - how?

1

There are 1 best solutions below

5
On BEST ANSWER

Obviously, the silver bullet would be to clean up your data.

For a quick fix with a single expression:

SELECT *, CASE
           WHEN age ~ ' ' THEN (split_part(age, '-', 1)::float8 + split_part(split_part(age, ' ', 1), '-', 2)::float8) / 2
           WHEN age ~ '-' THEN (split_part(age, '-', 1)::float8 + split_part(           age         , '-', 2)::float8) / 2
           ELSE age::float8
          END AS avg_age
FROM   data;

With aggregation:

SELECT id, age, avg(age_bound) AS avg_age
FROM  (
   SELECT *, string_to_table(split_part(age, ' ', 1), '-')::float8 AS age_bound
   FROM   data
   ) sub
GROUP  BY id, age
ORDER  BY id;  -- optional

fiddle

string_to_table() was added with Postgres 14.