Spark checkpointing data cleaning frequency

53 Views Asked by At

I am running the RecoverableNetworkWordCount program as an example to understand the checkpointing and I see checkpoint data created every second. Is it because my StreamingContext batch size is 1 sec? Also checkpoint directory maintains only the last 10 checkpoint data. Is this controlled by some property? what happens to the checkpoints that were captured earlier.

When does checkpointing data gets cleared if we don't set spark.cleaner.referenceTracking.cleanCheckpoints property to true?

Checkpoint files are created by MS-DOS executable types. Is there a way to decode what is present in the checkpoint file when it is created? Thanks.

0

There are 0 best solutions below