How do I aggregate an array of maps in Java Spark

55 Views Asked by MrTwinklePuff At 19 February 2024 at 22:47

I have a dataset "events" that includes an array of maps. I want to turn it into one map which is the aggregation of the amounts and counts

Currently, I'm running the following statement:

events.select(functions.col(“totalAmounts)).collectAsList()

which returns the following:

[
    [
        Map(totalCreditAmount -> 10, totalDebitAmount -> 50)
    ],
    [
        Map(totalCreditAmount -> 50, totalDebitAmount -> 100)
    ]   
]

I want to aggregate the amounts and counts and have it return:

[
    Map(totalCreditAmount -> 60, totalDebitAmount -> 150)
]

Original Q&A

There are 1 best solutions below

Lingesh.K On 19 February 2024 at 23:20

You can try using the explode function on the array of map column to get the result into an flattened array and then performing the sum aggregate

from pyspark.sql import functions as F

df = events.select(F.explode("totalAmounts").alias("flattenedAmounts"))
df = df.select(F.explode(df.flattenedAmounts)).groupBy("key").agg(F.sum("value").alias("value"))

final_result_as_map = df.rdd.collectAsMap()

The final_result_as_map must be of the shape and form you are expecting.

How do I aggregate an array of maps in Java Spark

There are 1 best solutions below

Related Questions in JAVA

Related Questions in APACHE-SPARK

Related Questions in DATA-TRANSFORM

Trending Questions

Popular # Hahtags

Popular Questions