I've spent three days looking for an answer so I hope you'll bear with me if this has already been addressed and I've been mighty unlucky finding a solution. I'm using Fortran (eugh!) but this is a generic MPI query.
Scenario (simplified for this example):
- Processes 0 and 1 communicate with process 2 (but not with each other)
- 0 & 1 do lots of sends/receives
- 2 does lots of receives/process/sends (but each pair is done twice so as to pick up both 0 & 1)
- 0 & 1 will eventually stop - I know not when! - so I do an MPI_Send from each when appropriate using the rank of the 3rd process (filter_rank_id=2) and a special tag (c_tag_open_rcv=200), with a logical TRUE in the buffer (end_of_run). Like this:
CALL MPI_SEND(end_of_run, 1, MPI_LOGICAL, filter_rank_id, c_tag_open_rcv, mpi_coupling_comms, mpi_err)
The problem arises in process 2... it's busy doing its MPI_Recv/MPI_Send pairs and I cannot break out of it. I have set up a non-blocking receive for each of the other two processes and stored the request handles:
DO model_rank_id= 0, 1
!Set up a non-blocking receive to get notification of end of model run for each model
end_run = end_model_runs(model_rank_id) !this is an array of booleans initialised to FALSE
CALL MPI_IRECV(end_run, 1, MPI_LOGICAL, model_rank_id, &
c_tag_open_rcv, coupling_comms, mpi_request_handle, mpi_err)
!store the handle in an array
request_handles(model_rank_id) = mpi_request_handle
END DO
where model_rank_id is the process number in the MPI communicator i.e. 0 or 1.
Later on, busy doing all those receive/send pairs, I always check whether anything's arrived in the buffer:
DO model_rank_id= 0, 1
IF (end_model_runs(model_rank_id) .EQV. .FALSE.) THEN
CALL MPI_TEST(request_handles(model_rank_id), run_complete, mpi_status, mpi_err)
IF (run_complete .eqv. .FALSE.) THEN
!do stuff... receive/process/send
ELSE
!run is complete
!___________removed this as I realised it was incorrect__________
!get the stop flag for the specific process
CALL MPI_RECV(end_run, 1, MPI_LOGICAL, model_rank_id, &
c_tag_open_rcv, coupling_comms, mpi_err)
!____________end_________________________________________________
!store the stop flag so I can do a logical 'AND' on it and break out when
!both processes have sent their message
end_model_runs(model_rank_id) = end_run
END IF
END IF
END DO
Note that this snippet is contained in a loop which carries on until all the stop flags are TRUE.
I know it's fairly complex, but this can't be that hard, can it? If anyone can see the error that'd be fantastic, or even suggest a better way to do it.
Huge thanks in advance.
Your program is probably stuck in the
MPI_RECV
call. The reason is that having a positive completion flag as returned byMPI_TEST
means thatMPI_IRECV
has received the message. Unless the sender sends another message with the same tag,MPI_RECV
will simply block and wait, in your case probably indefinitely. Apart from that, you are issuing twoMPI_IRECV
calls with the same receive buffer which is probably not what you really want to do sinceend_run = end_model_runs(model_rank_id)
does not copy the address of the array element intoend_run
but rather its value.Your code should look like this:
As a side note, using your own identifiers that start with
mpi_
is a terrible idea since those might clash with symbols provided by the MPI library. You should really treatmpi_
as a reserved prefix and never use it while naming your own variables, subroutines, etc. I've fixed that for you in the code above.