How do I aggregate an array of maps in Java Spark

55 Views Asked by At

I have a dataset "events" that includes an array of maps. I want to turn it into one map which is the aggregation of the amounts and counts

Currently, I'm running the following statement:

events.select(functions.col(“totalAmounts)).collectAsList() 

which returns the following:

[
    [
        Map(totalCreditAmount -> 10, totalDebitAmount -> 50)
    ],
    [
        Map(totalCreditAmount -> 50, totalDebitAmount -> 100)
    ]   
]

I want to aggregate the amounts and counts and have it return:

[
    Map(totalCreditAmount -> 60, totalDebitAmount -> 150)
]   
1

There are 1 best solutions below

0
Lingesh.K On

You can try using the explode function on the array of map column to get the result into an flattened array and then performing the sum aggregate

from pyspark.sql import functions as F

df = events.select(F.explode("totalAmounts").alias("flattenedAmounts"))
df = df.select(F.explode(df.flattenedAmounts)).groupBy("key").agg(F.sum("value").alias("value"))

final_result_as_map = df.rdd.collectAsMap()

The final_result_as_map must be of the shape and form you are expecting.