Dataframe Checkpoint Example Pyspark

16.4k Views Asked by dasilva555 At 10 June 2021 at 07:27

I read about checkpoint and it looks great for my needs but I couldn't find a good example of how to use it.
My questions are:

Should I specifiy the checkpoint dir? Is it possible to do it like this:

df.checkpoint()
Are there any optional params that I should be aware about?
Is there a default checkpoint dir or I must specify one as default?
When I checkpoint dataframe and I reuse it - It autmoatically read the data from the dir that we wrote the files?

It will be great if you can share with me example of using checkpoint in pyspark with some explanation. Thanks!

Original Q&A

There are 1 best solutions below

KGS On 10 June 2021 at 09:12 BEST ANSWER

You should assign the checkpointed dataframe to a variable as checkpoint "Returns a checkpointed version of this Dataset" (https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.checkpoint.html). So

df = df.checkpoint()

The only parameter is eager which dictates whether you want the checkpoint to trigger an action and be saved immediately, it is True by default and you usually want to keep it this way.

You have to set the checkpoint directory with SparkContext.setCheckpointDir(dirName) somewhere in your script before using checkpoints. Alternatively if you want to save to memory instead you can use localCheckpoint() instead of checkpoint() but that is not reliable and in case of issues/after termination the checkpoints will be lost (but it should be faster as it uses the caching subsystem instead of only writing to disk).

And yes, it should be read automatically, you can look at the history server and there should be "load data" nodes (I don't remember the exact name) at the start of blocks/queries

Dataframe Checkpoint Example Pyspark

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in SPARK-CHECKPOINT

Trending Questions

Popular # Hahtags

Popular Questions