Databricks Streaming scheduled job fails

328 Views Asked by At

I have created a scheduled job in databricks to execute a notebook at regular intervals.

Within the notebook, there are many commands separated by cells. Spark streaming query is one of the command in a cell.

The scheduled job fails because the streaming query takes sometime to complete the execution. But the problem is before completing the streaming query, next command is trying to get executed. So the job gets failed.

How can I make dependency for these 2 commands? I want the next command to run only after completion of streaming query.

I am using Dataframe API using Pyspark. Thanks

1

There are 1 best solutions below

1
On

you need to wait query to finish. it's usually done with .awaitTermination function (doc), like this:

query = df.writeStream.....
query.awaitTermination()