MPI Send latency for different process localities

373 Views Asked by At

I am currently participating in a course for efficient programming of supercomputers and multicore processors. Our recent assignment is to measure the latency for the MPI_Send command (thus the time spent sending a zero byte message). Now this alone would not be that hard, but we have to perform our measurements for the following criterias:

  • communication of processes in the same processor,
  • same node but different processors,
  • and for processes on different nodes.

I am wondering: How do i determine this? For proccesses on different nodes i thought about hashing the name returned by MPI_Get_processor_name, which returns the identifier of the node the process is currently running on, and sending it as a tag. I also tried using sched_cpu() to get the core id, but it seems like that this returns a incremental number, even if the cores a hyperthreaded (thus a process would run on the same core). How do i go about this? I just need a concept for determining the localities! Not a complete code for the stated problem. Thank you!

1

There are 1 best solutions below

0
On BEST ANSWER

In order to have both MPI processes placed on separate cores of the same socket, you should pass the following options to mpiexec:

-genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=core -genv I_MPI_PIN_ORDER=compact

In order to have both MPI processes on cores from different sockets, you should use:

-genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=core -genv I_MPI_PIN_ORDER=scatter

In order to have them on two separate machines, you should create a host file that provides only one slot per node or use:

-perhost 1 -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=core

You can check the actual pinning/binding on Linux by calling sched_getcpuaffinity() and examining the returned affinity mask. As an alternative, you could parse /proc/self/status and look for Cpus_allowed or Cpus_allowed_list. On Windows, GetProcessAffinityMask() returns the active affinity mask.

You could also ask Intel MPI to report the final pinning by setting I_MPI_DEBUG to 4, but it produces a lot of other output in addition to the pinning information. Look for lines that resemble the following:

[0] MPI startup(): 0       1234     node100  {0}
[0] MPI startup(): 1       1235     node100  {1}