OpenMP using loops and array reductions

734 Views Asked by At

I have written a program as follows:

#include "omp.h"
#include "stdio.h"

int main()
{
    int i, j, cnt[] = {0,0,0,0};
    #pragma omp parallel
    {
        int cnt_private[] = {0,0,0,0};
        #pragma omp for private(j)
        for(int i = 1 ; i <= 10 ; i++) {
            for(j = 1 ; j <= 10 ; j++) {
                int l= omp_get_thread_num();
                cnt_private[l]++;         
            }
            #pragma omp critical
            {   
               for(int m=0; m<3; m++){
                   cnt[m] = cnt_private[m];
               }
            }
           printf("%d %d %d %d %d\n",i,cnt[0],cnt[1],cnt[2],cnt[3]);
        }
     }
     return 0;
}

It should print the number of times each thread is executed for each i. As only one thread takes a particular i, the expected output should satisfy the sum of each row as 100. But I am getting the output of the form:

1 10 0 0 0
2 20 0 0 0
3 30 0 0 0
7 0 0 10 0
8 0 0 20 0
9 0 0 0 0
10 0 0 0 0
4 0 10 0 0
5 0 20 0 0
6 0 30 0 0

Where is the problem? Could it be in my fundamental understanding of OpenMP? or is my reduction process wrong? (I use a GNU gcc compiler and a 4 core machine) Compilation steps:

g++ -fopenmp BlaBla.cpp
export OMP_NUM_THREADS=4
./a.out  
1

There are 1 best solutions below

5
On BEST ANSWER

I do not see why the sum of each row should be 100.

You declared cnt_private to be private:

#pragma omp parallel
{
    int cnt_private[] = {0,0,0,0};
    // ...
}

As such the summation stored to it is not shared between threads. If thread l is executed only cnt_private[l] will be incremented and all others will be left at zero. Then you assing the content of cnt_private to cnt, which is not private. You assign every entry that is zero as well!

#pragma omp critical
{   
    for(int m=0; m<4; m++){ // I guess you want 'm<4' for the number of threads
        cnt[m] = cnt_private[m];
    }
}

With i ranging from 0 to 10 and the program using 4 threads, each threads gets 2 to 3 i's. As such I would expect the sum of each column to be either 30(10+20) or 60(10+20+30).