How can python wait for a batch SGE script finish execution?

1.3k Views Asked by At

I have a problem I'd like you to help me to solve.

I am working in Python and I want to do the following:

  • call an SGE batch script on a server
  • see if it works correctly
  • do something

What I do now is approx the following:

    import subprocess
    try:
       tmp = subprocess.call(qsub ....)
       if tmp != 0:
           error_handler_1()
       else:
           correct_routine()
    except:
       error_handler_2()

My problem is that once the script is sent to SGE, my python script interpret it as a success and keeps working as if it finished.

Do you have any suggestion about how could I make the python code wait for the actual processing result of the SGE script ?

Ah, btw I tried using qrsh but I don't have permission to use it on the SGE

Thanks!

2

There are 2 best solutions below

1
On BEST ANSWER

From your code you want the program to wait for job to finish and return code, right? If so, the qsub sync option is likely what you want:

http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html

1
On

Additional Answer for an easier processing: By using the python drmaa module : link which allows a more complete processing with SGE. A functioning code provided in the documentation is here: [provided you put a sleeper.sh script in the same directory] please notice that the -b n option is needed to execute a .sh script, otherwise it expects a binary by default like explained here

import drmaa
import os

def main():
   """Submit a job.
   Note, need file called sleeper.sh in current directory.
   """
   s = drmaa.Session()
   s.initialize()
   print 'Creating job template'
   jt = s.createJobTemplate()
   jt.remoteCommand = os.getcwd()+'/sleeper.sh'
   jt.args = ['42','Simon says:']
   jt.joinFiles=False
   jt.nativeSpecification  ="-m abe -M mymail -q so-el6 -b n"
   jobid = s.runJob(jt)
   print 'Your job has been submitted with id ' + jobid
   retval = s.wait(jobid, drmaa.Session.TIMEOUT_WAIT_FOREVER)
   print('Job: {0} finished with status {1}'.format(retval.jobId, retval.hasExited))
   print 'Cleaning up'
   s.deleteJobTemplate(jt)
   s.exit()

if __name__=='__main__':
    main()