I am trying to get an array of categories associated with each product and then also get the top-level parent category of each product in another column, which by my logic is finding the same values for the categories array, but only selecting where parent_id is NULL
which should pull back only one value and 1 record per id.
I really don't know the best way to structure this query. What I have kind of works, but it also shows NULL values in the parent category column for the categories that do have a parent ID and makes a second record for each product because I am forced to put it in the group by. Basically, I think I am not doing this in the correct or most efficient way.
Desired result:
+----+----------------+------------------+------------------------------------------------+------------------+
| id | name | category_ids | category_names | parent_category |
+----+----------------+------------------+------------------------------------------------+------------------+
| 1 | Product Name 1 | {111,222,333} | {Electronics, computers, computer accessories} | Electronics |
+----+----------------+------------------+------------------------------------------------+------------------+
My current query (which is not ideal):
select p.id,
p.name,
array_agg(category_id) as category_ids,
regexp_replace(array_agg(c.name)::text,'"|''','','gi') as category_names,
c1.name as parent_category
from products p
join product_categorizations pc on pc.product_id = p.id
join categories c on pc.category_id = c.id
full outer join (
select name, id from categories
where parent_id is null and name is not null
) c1 on c.id = c1.id
group by 1,2,5;
+----+----------------+------------------+-----------------------------------+------------------+
| id | name | category_ids | category_names | parent_category |
+----+----------------+------------------+-----------------------------------+------------------+
| 1 | Product Name 1 | {111} | {Electronics} | Electronics |
+----+----------------+------------------+-----------------------------------+------------------+
| 1 | Product Name 1 | {222,333} | {computers, computer accessories} | NULL |
+----+----------------+------------------+-----------------------------------+------------------+
Replace the
FULL JOIN
with an aggregateFILTER
clause:See:
(Why would you add
AND name IS NOT NULL
? Either way,min()
ignoresNULL
values anyway.)While aggregating all products, and while referential integrity is enforced, this should be a bit faster:
The point being that
product
only joins after aggregating rows.Aside: "name" is not a very helpful column name. Related: