Pyspark: How to effectively segregate data having different schema each

17 Views Asked by uds0128 At 30 July 2025 at 13:08

I have a kafka and spark-streaming stack. In spark streaming app has subscribed to one topic. This topic receives syslog messages into it. Here aruond 100 types of different messages can come. How can i efficiently segregate them to apply relevant schema to it as i want to store it in parquet. If i write 100 filter it will unnessacary create 100 dfs for just 2 second of time frame. Then again 100 df for new micro-batch.

Is there any way we can directly segregate them to separate-separate df.

Original Q&A

Pyspark: How to effectively segregate data having different schema each

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in SPARK-STREAMING-KAFKA

Trending Questions

Popular # Hahtags

Popular Questions