Does foreachBatch method is guaranteed to run a callback "exclusively", or it will be started asynchronously? I'm just working on a custom sink with relying on consecutive, non-concurrent batches, and it's quite important.
For example, a code:
.writeStream()
.trigger(Trigger.ProcessingTime(1, TimeUnit.MINUTES))
.foreachBatch((dataframe, batchId) -> {
//
// Can it run the callback next time before a previous one has completed?
// (i.e. if the previous is working longer than the trigger's time interval)
//
})
There is a similar API in Spark DStreams, foreachRDD, and docs explain how it works:
By default, output operations are executed one-at-a-time.
But I couldn't find anything like that in structured streaming docs. Did I miss anything?
(Of course, I could just run and check, but it would be good to have a clear answer to ensure that it's not an internal implementation detail and it will not be suddenly changed in a minor version upgrade.)