Hive Aggregate function for merging arrays

8.2k Views Asked by At

I need to merge arrays in a GROUP BY in HiveSQL. The table schema is something like this:

key int,
value ARRAY<int>

Now here is the SQL I would like to run:

SELECT key, array_merge(value)
FROM table_above
GROUP BY key

If this array_merge function only keeps unique values, that will be even better but not must.

Cheers, K

2

There are 2 best solutions below

0
On BEST ANSWER

there is no UDAF to perform that kind of operation. The following query should result in the same without much overhead (keep running one map and one reduce operation) removing duplicates

select key, collect_set(explodedvalue) from (
  select key, explodedvalue from table_above lateral view explode(value) e as explodedvalue
) t group by key;
0
On

There's been an answer here for a while but in case anyone else stumbles upon this; the combine_unique udf by Brickhouse is probably what you're looking for (the repo has instructions on how to get started with their UDFs).

Syntax is identical to what was proposed in the question:

SELECT key, combine_unique(value)
FROM table_above
GROUP BY key