What could cause asymmetric throughput of MPI nonblocking messages?

156 Views Asked by At

I'm using MPI nonblocking messages to communicate between 2 tasks. The communication pattern is as follows: Each task has a master thread that receives messages from the other tasks. It has 5 or so work threads that do a computation and send messages to the other tasks. The master thread loops, testing for incoming messages. This is the only thing it does.

My problem is that while task 0 instantaneously receives everything sent from task 1 (number of messages sent and received roughly match), task 1 only receives about 1/4 of the messages sent by task 0. After running for a minute, there are hundreds of thousands of outstanding messages.

Using PAPI, I've determined that task 1 seems to block on test and irecv. The instruction throughput is only 0.03 instr/cycle as opposed to >0.2 for the other task, and stopping the task in the debugger shows that it is trying to acquire a lock. However, the receive and test that is blocking is not the ones for the "missing" messages but for another class of much rarer messages.

I realize it's hard to say what could cause this without actually trying the code, but I find it puzzling that there is such an asymmetry in the MPI performance. The task that can't keep up with the receives is not for lack of trying, it's really spending all its time testing for incoming messages.

I'm using OpenMPI 1.5.3 with MPI_THREAD_MULTIPLE, and the communication is over sm, (the two tasks are on the same node).

Any ideas how to track this down would be appreciated.

0

There are 0 best solutions below