I'm having a hard time figuring out why I can't launch commands in parallel using the LSF blaunch
command:
for num in `seq 3`; do
blaunch -u JobHost ./cmd_${num}.sh &
done
Error message:
Oct 29 13:08:55 2011 18887 3 7.04 lsb_launch(): Failed while executing tasks.
Oct 29 13:08:55 2011 18885 3 7.04 lsb_launch(): Failed while executing tasks.
Oct 29 13:08:55 2011 18884 3 7.04 lsb_launch(): Failed while executing tasks.
Removing the ampersand (&
) allows the commands to execute sequentially, but I am after parallel execution.
When executed within the context of bsub, a single invocation of
blaunch -u <hostfile> <cmd>
will take<cmd>
and run it on all the hosts specified in<hostfile>
in parallel as long as those hosts are within the job's allocation.What you're trying to do is use 3 separate invocations of
blaunch
to run 3 separate commands. I can't find it in the documentation, but just some testing on a recent version of LSF shows that each individually executed task in such a job has a unique task ID stored for it in an environment variable called LSF_PM_TASKID. You can verify this in your version of LSF by running something like:Now, what does this have to do with your question? You want to run
./cmd_$i.sh
for i=1,2,3 in parallel throughblaunch
. To do this you can write a single script which I'll callcmd.sh
as follows:Now you can replace your for loop with a single invocation of
blaunch
like so:This will run one instance of
cmd.sh
on each host listed in the file 'JobHost' in parallel, each of these instances will run the shell scriptcmd_X.sh
whereX
is the value of$LSF_PM_TASKID
for that particular task.If there's exactly 3 hostnames in 'JobHost' then you will get 3 instances of
cmd.sh
which will in turn lead to one instance each ofcmd_1.sh
,cmd_2.sh
, andcmd_3.sh