YARN scheduler: reject application after timeout

383 Views Asked by At

I have a cluster on which there's one queue for low priority jobs. These jobs can wait for hours before being executed, it does not matter. The only problem I have is that my applications run under a timeout command to kill any suspiciously long running job. I recently added a new job which takes up the entirety of the queue's capacity and runs for several hours. The behaviour I would like to have is that incoming jobs are rejected after a certain amount of time if no capacity could be allocated to them. This way, they could give up and come back later. I do not want to modify my own timeout thresholds - their semantic is supposed to be how long the job runs for, not how long the whole scheduling + job execution lasted.

I did not see anything like this after some research. Is anybody aware of an existing scheduler allowing that, or a cheap way to do it using an existing scheduler (like the default CapacityScheduler) ?.

PS: justification for the apache-spark tag is that it will give this question broader visibility and will have more chance to reach answerers and future readers looking for questions about YARN-Spark.

0

There are 0 best solutions below