How to specify job timeout in Spark?

1.4k Views Asked by At

I have a spark job running on kubernetes using the spark-on-k8s-operator. This job usually takes less than 5 minutes to complete but sometimes I'm having a problem of job stuck because of executors lost that I'm still investigating.

How can I specify a timeout in Spark to make the driver kill all the executors and itself if the execution exceed the specified timeout ?

1

There are 1 best solutions below

0
On BEST ANSWER

spark.scheduler.excludeOnFailure.unschedulableTaskSetTimeout

The timeout in seconds to wait to acquire a new executor and schedule a task before aborting a TaskSet which is unschedulable because all executors are excluded due to task failures.

from https://spark.apache.org/docs/latest/configuration.html

As I'm aware, the Spark helm chart doesn't offer the spark.scheduler.excludeOnFailure.unschedulableTaskSetTimeout configuration option

See https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/charts/spark-operator-chart/README.md