I'm trying to understand what would be the behavior of an MPI implementation (mine is OpenMPI 4.1.0) dictated by the MPI standard of this code :
#include <iostream>
#include <mpi.h>
int main (int argc, char ** argv) {
MPI_Init(&argc, &argv);
int idProc, nbProc;
MPI_Comm_rank(MPI_COMM_WORLD,&idProc);
MPI_Comm_size(MPI_COMM_WORLD,&nbProc);
if (idProc == 0 || idProc == 1){
MPI_Group worldGroup;
MPI_Comm_group(MPI_COMM_WORLD, &worldGroup);
int* ranks = new int[2];
ranks[0] = 0;
ranks[1] = 0;
MPI_Group intrfGroup;
MPI_Group_incl(worldGroup, 2, ranks, &intrfGroup);
MPI_Comm mpiComm;
std::cout << "MPI_Comm_create_group before ! By proc " << idProc << std::endl;
MPI_Comm_create_group(MPI_COMM_WORLD, intrfGroup, 0, &mpiComm);
std::cout << "MPI_Comm_create_group after ! By proc " << idProc << std::endl;
int result;
MPI_Comm_compare(mpiComm, MPI_COMM_WORLD, &result);
if (result == MPI_IDENT) {
std::cout << "mpiComm is MPI_COMM_WORLD." << std::endl;
} else {
std::cout << "mpiComm is not MPI_COMM_WORLD." << std::endl;
}
}
MPI_Finalize();
}
It returns :
[ACH@is227051 bin]$ mpiexec -n 4 Allreduce
MPI_Comm_create_group before ! By proc 1
MPI_Comm_create_group after ! By proc 1
MPI_Comm_create_group before ! By proc 0
[is227051:03157] *** An error occurred in MPI_Comm_compare
[is227051:03157] *** reported by process [3464888321,1]
[is227051:03157] *** on communicator MPI_COMM_WORLD
[is227051:03157] *** MPI_ERR_COMM: invalid communicator
[is227051:03157] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[is227051:03157] *** and potentially your MPI job)
I tried if (idProc == 0)
to let only proc 0, which is the only proc in the rank vector enter the if, but if I do that it just hang at :
[ACH@is227051 bin]$ mpiexec -n 4 Allreduce
MPI_Comm_create_group before ! By proc 0
What I'm trying to understand here, is the behavior of MPI when you creates group/communicator from an array of int rank that list the ranks of the proc that you want to create a group/communicator but have doublons, ultimately just the same rank twice. I could not find anything in the MPI standard document describing this.
Ok, actually the answer was right here in the documentation of OpenMPI :
Moreover, it is worth noticing that it also says :
So the behavior I've got with my code is completely normal and now explained.
I hope this helps other people in the future.