changing thread number doesn't affect code

283 Views Asked by At

I am trying to learn xeon-phi , and while studying the Intel Xeon-Phi Coprocessor HPC book , I tried to run the code here. (from book)

The code uses openmp and 2 threads.

But the results I am taking are the same as running with 1 thread. ( no use of openmp at all )

I even used in mic different combinations but still the same:

export OMP_NUM_THREADS=2
export MIC_OMP_NUM_THREADS=124
export MIC_ENV_PREFIX=MIC

It seems that somehow openmp is not enabled?Am I missing something here?

The code using only 1 thread is here

I compiled using:

icc -mmic -openmp -qopt-report -O3 hello.c

Thanks!

3

There are 3 best solutions below

0
froth On BEST ANSWER

I am not sure exactly which book you are talking about, but perhaps this will help.

The code you show does not use the offload programming style and must be run natively on the the coprocessor, meaning you copy the executable to the coprocessor and run it there or you use the micnativeloadex utility to run the code from the host processor. You show that you know the code must be run natively because you compiled it with the -mmic option.

If you use micnativeloadex, then the number of omp threads on the coprocessor is set by executing "export MIC_OMP_NUM_THREADS=124" on the host. If you copy the executable to the coprocessor and then log in to run it there, the number of omp threads on the coprocessor is set by executing "export OMP_NUM_THREADS=124" on the coprocessor. If you use "export OMP_NUM_THREADS=2" on the coprocessor, you get only two threads; the MIC_OMP_NUM_THREADS environment variable is not used if you set it directly on the coprocessor.

I don't see any place in the code where it prints out the number of threads, so I don't know for sure how you determined the number of threads actually being used. I suspect you were using a tool like micsmc. However micsmc tells you how may cores are in use, not how many threads are in use.

By default, the omp threads are laid out in order, so that the first core would run threads 0,1,2,3, the second core would run threads 4,5,6,7 and so on. If you are using only two threads, both threads would run on the first core.

So, is that what you are seeing - not that you are using only one thread but instead that you are using only one core?

0
Sunny Gogar On

I was looking at the serial version of the code you are using. For the following lines:

        for(j=0; j<MAXFLOPS_ITERS; j++)  
        {
        //
        // scale 1st array and add in the 2nd array
        // example usage - y = mx + b;
        //
            for(k=0; k<LOOP_COUNT; k++)  
            {
                fa[k] = a * fa[k] + fb[k];
            }
         }

I see that here you do not scan the complete array. Instead you keep on updating the first 128 (LOOP_COUNT) elements of the array Fa. If you wish to compare this serial version to the parallel code you are referring to, then you will have to ensure that the program does same amount of work in both versions.

Thanks

0
user3582234 On

I noticed three things in your first program omp:

  1. the total floating point operations should reflect the number of threads doing the work. Therefore,

gflops = (double)( 1.0e-9*LOOP_COUNTMAXFLOPS_ITERSFLOPSPERCALC*numthreads);

  1. You harded code the number of thread = 2. If you want to use the OMP env variable, you should comment out the API "omp_set_num_threads(2);"

  2. After transferring the binary to the coprocessor, to set the OMP env variable in the coprocessor please use OMP_NUM_THREADS, and not MIC_OMP_NUM_THREADS. For example, if you want 64 threads to run your program in the coprocessor:

% ssh mic0

% export OMP_NUM_THREADS=64