I have a kafka and spark-streaming stack. In spark streaming app has subscribed to one topic. This topic receives syslog messages into it. Here aruond 100 types of different messages can come. How can i efficiently segregate them to apply relevant schema to it as i want to store it in parquet. If i write 100 filter it will unnessacary create 100 dfs for just 2 second of time frame. Then again 100 df for new micro-batch.
Is there any way we can directly segregate them to separate-separate df.