Mpiexec taking forever to start (possible issue with MPIPoolExecutor)?

74 Views Asked by At

I am trying to run the following command:

mpiexec -n 1 python scratch.py

where scratch.py is a simple example provided here

from mpi4py.futures import MPIPoolExecutor
def square(i):
global initialized
try:
    initialized
except NameError:
    initialized = False
if not initialized:
    print("expensive initialization")
import time
time.sleep(2)
initialized = True
return i**2

if __name__ == '__main__':
    with MPIPoolExecutor(2) as ex:
        for result in ex.map(square, range(7)):
            print (result)

Which lingers on indefinitely

ps aux | grep mpi gives:

username 16944 44.6 0.0 377280 14476 ? Rl 09:46 0:03 python -R -m mpi4py.futures.server

So, I understand that problem lies within MPIPoolExecutor. I also wondered if there was a firewall constraint but, systemctl status firewalld gives

firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) What exactly does "python -R -m mpi4py.futures.server" do and why does it take forever?

I have;

  • Python3 version 3.8.0
  • mpich version 4.2.0

Using Scientific Linux 7.4

CPU details:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 158
Model name:            Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
Stepping:              9
CPU MHz:               3300.000
CPU max MHz:           3500.0000
CPU min MHz:           800.0000
BogoMIPS:              6000.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3

Finally, running the command python3 -R -m mpi4py.futures.server individually yields the following error:

 Traceback (most recent call last):
  File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/people/hkaya/.local/lib/python3.8/site-packages/mpi4py/futures/server.py", line 14, in <module>
    main()
  File "/home/people/hkaya/.local/lib/python3.8/site-packages/mpi4py/futures/server.py", line 10, in main
    _lib.server_main()
  File "/home/people/hkaya/.local/lib/python3.8/site-packages/mpi4py/futures/_lib.py", line 1068, in server_main
    server_main_service()
  File "/home/people/hkaya/.local/lib/python3.8/site-packages/mpi4py/futures/_lib.py", line 1057, in server_main_service
    comm = server_accept(service, info)
  File "/home/people/hkaya/.local/lib/python3.8/site-packages/mpi4py/futures/_lib.py", line 1006, in server_accept
    MPI.Publish_name(service, port, info)
  File "mpi4py/MPI/Comm.pyx", line 2755, in mpi4py.MPI.Publish_name
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
  Proc: [[32891,0],0]
  Errorcode: 1
1

There are 1 best solutions below

1
Hikmet Emre Kaya On

As per the suggestions commented under my question, I double checked and found that mpi4py was indeed using a different library from what mpiexec was linked to. So, I uninstalled and reinstalled mpi4py with mpich libraries using this link, now MPIPoolExecutor runs without any issues. Thanks everyone!