I am able to run OpenMPI job in multiple nodes under ssh. Everything looks good but I find that I do not know much about what is really happening. So, how nodes communicate in OpenMPI? It's in multiple nodes, hence it can not be shared memory. It also seems not TCP or UDP because I have not configured any port. Can anyone describe what happened when a message is sent and received between 2 processes in 2 nodes? THANKS!
How nodes communicate in OpenMPI
2.8k Views Asked by Fallin At
1
There are 1 best solutions below
Related Questions in MPI
- MPI Processes Communication error
- Scattered indices in MPI
- MPI+OpenMP job submission script on LSF
- Forwarding signals in bash script which is submitted on the cluster
- boost mpi sends NULL messages
- How to know the all the ranks that are part of a group in MPI outside that group?
- How can I measure the memory occupancy of Python MPI or multiprocessing program?
- IPython MPI with a Machinefile
- Parallel HDF5: "make check" hangs when running t_mpi
- Excel VBA call DLL developed using MPI
- non-blocking communications in MPI: order of messages
- Largest Number Datatype MPI
- MPI reverse probe
- On entry to NIT parameter number 9 had an illegal value
- Find an element in array using MPI?
Related Questions in OPENMPI
- OSX What does "error: cannot convert 'const std::__cxx11::basic_string<char>" mean?
- Hyrbid MPI / OpenMP
- MPI Random Broadcasting
- How to use GPUDirect RDMA with Infiniband
- Error loading Rmpi, reference to "system2"
- Undefined symbol ompi_mpi_info_null for HDF5 - Ubuntu 16.04
- my own Class type not working well with MPI_scatterv and Gatherv
- communications in MPMD MPI executions
- openmpi ignored error: mca interface is not recognized
- BFS with OpenMPI
- Segmentation fault in MPI_Send for derived data types
- Ubuntu mpi4py won't compile
- MPI error with large number of process
- Determining MPI implementation programmatically
- What is the proper way to handle MPI communicators in Fortran?
Related Questions in MPI4PY
- How can I measure the memory occupancy of Python MPI or multiprocessing program?
- MPI Random Broadcasting
- ipyparallel with mpi cannot find engines
- Cython (default) include directories
- dask and parallel hdf5 writing
- Cannot install mpi4py on CentOS 7
- Ubuntu mpi4py won't compile
- FiPy fork error
- Why does Python hang when running mpirun within a subprocess?
- mpiexec - Credentials for user rejected connecting host
- Python hybrid multiprocessing / MPI with shared memory in the same node
- How to correctly use MPI.Allgather() in python to gather different sized arrays from different processes
- MPI4PY: ring communication with neighbor_alltoallw
- Slurm and mpi4py : repeat n times the operation with one process instead of doing it one time with n process
- mpi4py Allgatherv for matrices
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Open MPI is built on top of a framework of frameworks called Modular Component Architecture (MCA). There are frameworks for different activities such as point-to-point communication, collective communication, parallel I/O, remote process launch, etc. Each framework is implemented as a set of components that provide different implementations of the same public interface.
Whenever the services of a specific framework are requested for the first time, e.g., those of the Byte Transfer Layer (BTL) or the Matching Transport Layer (MTL), both of which transfer messages between the ranks, MCA enumerates through the various components capable of fulfilling the requirements and tries to instantiate them. Some components have specific requirements on their own, e.g., require specific hardware to be present, and fail to instantiate if those aren't met. All components that instantiate successfully are scored and the one with the best score is chosen to carry out the request and other similar requests. Thus, Open MPI is able to adapt itself to different environments will very little configuration on the user side.
For communication between different ranks, the BTL and MTL frameworks provide multiple implementations and the set depends heavily on the Open MPI version and how it was built. The
ompi_infotool can be used to query the library configuration. This is an example from one of my machines:The different components listed here are:
openib-- uses InfiniBand verbs to communicate over InfiniBand networks, which is one of the most widespread high-performance communication fabric for clusters nowadays, and other RDMA-capable networks such as iWARP or RoCEsm-- uses shared memory to communicate on the same nodetcp-- uses TCP/IP to communicate over any network that provides a sockets interfacevader-- similarly tosm, provides shared memory communication on the same nodeself-- provides efficient self-communicationpsm-- uses the PSM library to communicate over networks derived from PathScale's InfiniBand variant, such as Intel Omni-Path (r.i.p.)ofi-- alternative InfiniBand transport that uses OpenFabrics Interfaces (OFI) instead of verbsThe first time rank A on
hostAwants to talk to rank B onhostB, Open MPI will go through the list of modules.selfonly provides self-communication and will be excluded.smandvaderwill get excluded since they only provide communication on the same node. If your cluster is not equipped with a high-performance network, the most likely candidate to remain istcp, because there is literally no cluster node that doesn't have some kind of Ethernet connection to it.The
tcpcomponent probes all network interfaces that are up and notes their network addresses. It opens listening TCP ports on all of them and publishes this information on a central repository (usually managed by thempiexecprocess used to launch the MPI program). When theMPI_Sendcall in rank A requests the services oftcpin order to a send message to rank B, the component looks up the information published by rank B and selects all IP addresses that are in any of the networks thathostAis part of. It then tries to create one or more TCP connections and upon success the messages start flowing.In most cases, you do not need to configure anything and the
tcpcomponent Just Works™. Sometimes though it may need some additional configuration. For example, the default TCP port range may be blocked by a firewall and you may need to tell it to use a different one. Or it may select network interfaces that have the same network range, but do not provide physical connectivity - typical case are the virtual interfaces used by the various hypervisors or container services. In this case, you have to telltcpto exclude those interfaces.Configuring the various MCA components is done by passing in MCA parameters with the
--mca param_name param_valuecommand-line argument ofmpiexec. You may query the list or parameters that a given MCA component has and their default values withompi_info --param framework component. For example:Parameters have different levels and by default
ompi_infoonly shows parameters of level 1 (user/basic parameters). This can be changed with the--level Nargument to show parameters up to levelN. The levels go all the way up to 9 and those with higher levels are only required in very advanced cases, such as fine-tuning the library or debugging issues. For example,btl_tcp_port_min_v4andbtl_tcp_port_range_v4, which are used in tandem to specify the port range for TCP connections, are parameters of level 2 (user/detail).