Restart job in Condor after certain amount time

755 Views Asked by At

I am running jobs on Condor and have noticed that for some reason a subset of my jobs will run but never complete. Is there a setting in the submit file that kills and then resubmits a job if it takes over a certain amount of time to complete? This is similar to the question Condor Timeout for idle jobs except I want Condor not to simply kill the jobs, but resubmit them as well.

Thanks!

1

There are 1 best solutions below

0
On

you can use the KILL transition expression in the machine class add file (Condor user manual). Something like:

START = True
...
+MaxJobExecutionTime = xxx #seconds
KILL            = $(ActivityTimer) > MaxJobExecutionTime

Like this the machine will kill jobs that take more than MaxExecutionTime. Condor will then retry the job.