Is there something like Glue "Bookmark" feature in spark which keeps track at job level?

959 Views Asked by VE88 At 14 September 2021 at 06:59

I am looking to see if there is something like AWS Glue "bookmark" in spark. I know there is checkpoint in spark which works well on individual data source. In Glue we could use bookmark to keep track of all the files across different tables involved in the job using single bookmark.

Original Q&A

There are 1 best solutions below

Robert Kossendey On 16 November 2022 at 13:02

You can use Spark Structured Streaming in combination with Trigger.Once() for that.

The stream will essentially just run one micro stream batch, which is the same as a single batch, while leveraging the checkpointing capability which keeps track of the processed files

Is there something like Glue "Bookmark" feature in spark which keeps track at job level?

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in SPARK-STREAMING

Related Questions in AWS-GLUE

Related Questions in INCREMENTAL-LOAD

Trending Questions

Popular # Hahtags

Popular Questions