OpenMP multi-threading not working if OpenMPI set to use one or two MPI processor

55 Views Asked by At

I developed a code parallelized in a hybrid way based on OpenMPI + OpenMP. It works as I expect if 'enough' number of MPI processors are given. So far based on tests, I would roughly say 'enough' means more than two MPI processors.

A problem I observe is that if the code is allocated only one or two MPI processors, then the multi-threading via OpenMP does not work as expected, but stuck by 200% CPU usage (i.e., uses only 2-threads), not more. It is very unclear why this happens.

Here is information about running environment;

Ubuntu 20.04.4 LTS, gfortran 13.2.0, openmpi 4.1.5

To provide reporduction of my issue, here is a toy code that replicates the same issue;

     program parallel_example
    
        use OMP_LIB

        implicit none

        include 'mpif.h'
    
        integer :: i, j, k, n, ierror, size_Of_Cluster, process_Rank
        real    :: sum, x

        call MPI_INIT(ierror)
        call MPI_COMM_SIZE(MPI_COMM_WORLD, size_Of_Cluster, ierror)
        call MPI_COMM_RANK(MPI_COMM_WORLD, process_Rank, ierror)

        call omp_set_dynamic(.False.)
        call omp_set_num_threads(5) 
 
       !$OMP PARALLEL
           print *, 'hello from thread:', OMP_GET_THREAD_NUM(), &
       &  'of proc=', process_Rank
       !$OMP END PARALLEL

       ! Set the number of iterations
       n = 100000
      
       ! Initialize the sum
       sum = 0.0

       !$omp parallel do collapse(3) default(none) private(i, j, k, x) shared(sum, n)
       do i = 1, n
         do j = 1, n
           do k = 1, n
           !print *, 'hello from thread:', OMP_GET_THREAD_NUM(), i, j, k
             x = 1.0 / (real(i) + real(j) + real(k))
             !$omp atomic
             sum = sum + x
           enddo
         enddo
       end do
       !$omp end parallel do
      
       print *, "The sum is: ", sum

       call MPI_Finalize(ierror)

      end program parallel_example

The number of threads per MPI processor is set to 5. So I expect 500% CPU usage of each MPI processor from 'top' command of my ubuntu.

This is compile and execution processes;

mpif90 -fopenmp test.F90 -o app.exe
mpirun -np 1 ./app.exe

If I use 'mpirun -np 1' or 'mpirun -np 2', the CPU usage is stuck by 200%, no more. But if I give more than 2, for example 'mpirun -np 3', I can finally see 500% CPU usage for each of those three MPI processors.

It is very unclear to me why I cannot get 500% CPU usage with one or two MPI processors. I am pretty sure I am missing something to setup the environment properly, but I really don't know what is wrong. So if anyone has knowledge on this, please consider sharing it with me.

  • Update: The answer has been identified - addition of "--map-by node:pe=N" resolved the issue. This means even with one MPI processor, five threads is able to be used. It seems that argument is an OpenMPI convention, but I do not fully understand it yet. But to help with other people who may suffer with similar issues, I hope "--map-by node:pe=N" can resolve them. Thank you for the comments!
0

There are 0 best solutions below