Parallel random number using MKL VSL not in parallel ? [ fortran90 ]

1.3k Views Asked by At

I've implemented the code bellow that generate vectors of random number using the MKL VSL library:

! ifort -mkl test1.f90 -cpp -openmp

include "mkl_vsl.f90"

#define ITERATION 1000000
#define LENGH 10000

program test
use mkl_vsl_type
use mkl_vsl
use mkl_service
use omp_lib
implicit none 

integer i,brng, method, seed, dm,n,errcode
real(kind=8) r(LENGH) , s
real(kind=8) a, b, start,endd
TYPE (VSL_STREAM_STATE) :: stream
integer(4) :: nt

!     ***** 

brng   = VSL_BRNG_SOBOL
method = VSL_RNG_METHOD_UNIFORM_STD
seed = 777

a = 0.0
b = 1.0
s = 0.0

!call omp_set_num_threads(4)
call omp_set_dynamic(0)
nt = omp_get_max_threads()

!     ***** 

print *,'max OMP threads number',nt

if (1 == omp_get_dynamic()) then
  print '(" Intel OMP may use less than "I0" threads for a large problem")', nt
else
  print '(" Intel OMP should use "I0" threads for a large problem")', nt
end if

if (1 == omp_get_max_threads()) print *, "Intel MKL does not employ threading" 

!call mkl_set_num_threads(4)
call mkl_set_dynamic(0)
nt = mkl_get_max_threads()

print *,'max MKL threads number',nt

if (1 == mkl_get_dynamic()) then
  print '(" Intel MKL may use less than "I0" threads for a large problem")', nt
else
  print '(" Intel MKL should use "I0" threads for a large problem")', nt
end if

if (1 == mkl_get_max_threads()) print *, "Intel MKL does not employ threading"      

!     ***** Initialize *****

      errcode=vslnewstream( stream, brng,  seed )

!     ***** Call RNG *****

start=omp_get_wtime()

do i=1,ITERATION 
      errcode=vdrnguniform( method, stream, LENGH, r, a, b ) 
      s = s + sum(r)/LENGH
end do      

endd=omp_get_wtime()    

!     ***** DEleting the stream *****      

      errcode=vsldeletestream(stream)

!     ***** 

print *, s/ITERATION, endd-start

end program test

I don't see any speedup when using 4 and 32 threads for instance.
I use the Intel compiler version 13.1.3 and compile doing

ifort -mkl test1.f90 -cpp -openmp

It's like the random numbers are not generated in parallel.
Any hints here?

Thank you,

Éric.

1

There are 1 best solutions below

3
On

Your code doesn't contain any OpenMP directives to actually parallelise the work, when it executes it runs only 1 thread. It is not sufficient to use omp_lib and to scatter a few calls to functions such as omp_get_wtime around, you actually have to insert some worksharing directives.

If I run your code, as is, my performance monitor shows that only one thread is active, and your code reports

 max OMP threads number 16
 Intel OMP should use 16 threads for a large problem
 max MKL threads number 16
 Intel MKL should use 16 threads for a large problem
 0.499972674509302 11.2807227574035

If I simply wrap the loop in an OpenMP worksharing directive, like this

!$omp parallel do
do i=1,ITERATION 
      errcode=vdrnguniform( method, stream, LENGH, r, a, b ) 
      s = s + sum(r)/LENGH
end do      
!$omp end parallel do

then the performance monitor on my dual-quad-core-with-hyperthreading-PC shows that 16 threads are active and your program reports

 max OMP threads number 16
 Intel OMP should use 16 threads for a large problem
 max MKL threads number 16
 Intel MKL should use 16 threads for a large problem
 0.380979220384302 7.17352125150956

I guess the hint I would offer is: study your favourite OpenMP tutorial, in particular the sections covering the parallel and do directives. I offer no warranty that the simple modification I have made does not break your program; in particular I don't guarantee that I haven't introduced a race condition.

I leave you the exercise of determining whether the speed-up on going from 1 to 16 (hyper-)threads is acceptable and any analysis of why it appears to be so modest.