I am currently participating in a course for efficient programming of supercomputers and multicore processors. Our recent assignment is to measure the latency for the MPI_Send command (thus the time spent sending a zero byte message). Now this alone would not be that hard, but we have to perform our measurements for the following criterias:
- communication of processes in the same processor,
- same node but different processors,
- and for processes on different nodes.
I am wondering: How do i determine this? For proccesses on different nodes i thought about hashing the name returned by MPI_Get_processor_name, which returns the identifier of the node the process is currently running on, and sending it as a tag. I also tried using sched_cpu() to get the core id, but it seems like that this returns a incremental number, even if the cores a hyperthreaded (thus a process would run on the same core). How do i go about this? I just need a concept for determining the localities! Not a complete code for the stated problem. Thank you!
In order to have both MPI processes placed on separate cores of the same socket, you should pass the following options to
mpiexec
:In order to have both MPI processes on cores from different sockets, you should use:
In order to have them on two separate machines, you should create a host file that provides only one slot per node or use:
You can check the actual pinning/binding on Linux by calling
sched_getcpuaffinity()
and examining the returned affinity mask. As an alternative, you could parse/proc/self/status
and look forCpus_allowed
orCpus_allowed_list
. On Windows,GetProcessAffinityMask()
returns the active affinity mask.You could also ask Intel MPI to report the final pinning by setting
I_MPI_DEBUG
to 4, but it produces a lot of other output in addition to the pinning information. Look for lines that resemble the following: