Avoid printing job exit codes in SGE with option -sync yes

591 Views Asked by At

I have a Perl script which submits a bunch of array jobs to SGE. I want all the jobs to be run in parallel to save me time, and the script to wait for them all to finish, then go on to the next processing step, which integrates information from all SGE output files and produces the final output.

In order to send all the jobs into the background and then wait, I use Parallel::ForkManager and a loop:

$fork_manager = new Parallel::ForkManager(@as); 
# @as: Max nb of processes to run simultaneously
for $a (@as) {
    $fork_manager->start and next; # Starts the child process
    system "qsub <qsub_options> ./script.plx";
    $fork_manager->finish; # Terminates the child process
}
$fork_manager->wait_all_children; 
<next processing step, local>

In order for the "waiting" part to work, however, I have had to add "-sync yes" to the qsub options. But as a "side effect" of this, SGE prints the exit code for each task in each array job, and since there are many jobs and the single tasks are light, it basically renders my shell unusable due to all those interupting messages while the qsub jobs are running.

How can I get rid of those messages? If anything, I would be interested in checking qsub's exit code for the jobs (so I can check everything went ok before the next step), but not in one exit code for each task (I log the tasks' error via option -e anyway in case I need it).

1

There are 1 best solutions below

0
On

The simplest solution would be to redirect the output from qsub somewhere, i.e.

system("qsub <qsub options> ./script.plx >/dev/null 2>&1");

but this masks errors that you might want to see. Alternatively, you can use open() to start the subprocess and read it's output, only printing something if the subprocess generates an error.

I do have an alternate solution for you, though. You could submit the jobs to SGE without -sync y, and capture the job id when qsub prints it. Then, turn your summarization and results collection code into a follow on job and submit it with a dependency on the completion of the first jobs. You can submit this final job with -sync y so your calling script waits for it to end. See the documentation for -hold_jid in the qsub man page.

Also, rather than making your calling script decide when to submit the next job (up to your maximum), use SGE's -tc option to specify the maximum number of simultaneous jobs (note that -tc isn't in the man page, but it is in qsub's -help output). This depends on you using a new enough version of SGE to have -tc, of course.