How to perform multiple time Window operation on Streaming DataFrame?

924 Views Asked by At

I have 3 columns in DataFrame:- [time:TimeStamp,col1:Double,col2:Double] I want to perform following operation:

dataFrame.withWatermark("time", "10 seconds")
         .groupBy(window(col("time"),"10 seconds","1 second"))
         .agg(mean("col1") with window of 10 seconds,max("col") with window of 5 seconds)` 
2

There are 2 best solutions below

1
Tathagata Das On

Multiple aggregations on different sets of keys (different window = different grouping keys) in a single streaming query is not yet supported. You would have to run 2 different queries.

1
Shashi On

Where dynamic rules which contains multiple aggregations (Avg, Max,..etc. Spark batch Supported) can not be applied on Spark Structured Streaming until 2.2. Even breaking the queries and joining them also Spark considers it as multiple aggregation and throws the exception.

Example from Logical Plan : Aggr1: Aggregate [EventTime#29, CategoryName#15], [EventTime#29, CategoryName#15, sum(ItemValue#10) AS sum(ItemValue)#64]

Aggr2: Aggregate [EventTime#84, CategoryName#105], [EventTime#84, CategoryName#105, avg(ItemValue#100) AS avg(ItemValue)#78]

org.apache.spark.sql.AnalysisException: Multiple streaming aggregations are not supported with streaming DataFrames/Datasets;;