What's the best way to merge a series of json format binlog record into a Hudi table using Spark?

43 Views Asked by Rinze At 27 December 2023 at 08:38

I have a Hudi table, and some json-format binlog records. And now I want to merge these binlog record into the Hudi table. As we know, binlog records need to be executed in order. What's the best way to do this? Should I traverse each binlog record in order and perform corresponding operations in the Hudi table? Or is there any other elegant operation to achieve this?

Original Q&A

There are 1 best solutions below

Sumit Singh On 28 December 2023 at 05:19

You can use Custom Spark Job with Ordered Processing:

Create a Spark job to read the binlog records as a DataFrame
Sort the DataFrame by the binlog sequence number or timestamp.
Traverse the sorted DataFrame, performing Hudi operations (insert, update, delete) for each record.

You can also check Hudi DeltaStreamer with Custom Converter

What's the best way to merge a series of json format binlog record into a Hudi table using Spark?

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PARQUET

Related Questions in APACHE-HUDI

Related Questions in MYSQLBINLOG

Trending Questions

Popular # Hahtags

Popular Questions