I have created a scheduled job in databricks to execute a notebook at regular intervals.
Within the notebook, there are many commands separated by cells. Spark streaming query is one of the command in a cell.
The scheduled job fails because the streaming query takes sometime to complete the execution. But the problem is before completing the streaming query, next command is trying to get executed. So the job gets failed.
How can I make dependency for these 2 commands? I want the next command to run only after completion of streaming query.
I am using Dataframe API using Pyspark. Thanks
you need to wait query to finish. it's usually done with
.awaitTermination
function (doc), like this: