I and doing a magic square program with OpenMP in C to try it to get faster. However, my times are higher than the sequencial implementation.
I don't know if I am doing the omp for in a good way and I can't understand if that loop for is spreading into threads how its supposed to get faster, or if I should use something else, Can someone help me?
My sample code:
#pragma omp parallel private(i,j)
{
//soma diagonal principal
#pragma omp for
for( i = 0; i < size; i++)
sum += matrix[i][i];
#pragma omp for
for( i = 0; i < size; i++)
sumAux += matrix[i][size-i-1];
//printf("\nSoma diagonal principal %i e secundária %i\n", sum, sumAux);
//------------------------------------------LINHAS E COLUNAS-----------------------------------------------------------
#pragma omp for
for(int i = 0; i < size; i++) {
sumRow = 0;
sumCol = 0;
for(int j = 0; j < size; j++) {
sumRow += matrix[i][j];
sumCol += matrix[j][i];
}
//printf("soma Linhas %i\n",sumRow );
//printf("soma Colunas %i\n",sumCol );
}
}
//------------------------------------------PRINTS-----------------------------------------------------------
if (sum == sumCol && sum==sumRow && sum==sumAux ) {
printf("Quadrado magico com sum = %d \n", sum);
} else {
printf("Quadrado nao magico \n");
}
return 0;
}
The code has several race-conditions, namely the updates of the variables
sum
,sumAux
,sumRow
, andsumCol
. Moreover, this:is just wrong, since:
Therefore, you should check that the sum of the values of each row and the sum of the value of each column produces the same result as the sum of the diagonals (from the previous step). Moreover, you can optimize your sequential code by exiting earlier in case the constraints are not meet:
To solve the aforementioned race-conditions you should use the OpenMP
reduction
clause:Introducing OpenMP in a code, or parallelism for that matter, will not magically make your code faster. TL;DR The work done in parallel should be big enough to overcome the overhead of the parallelism (e.g., thread creation, synchronization and so). To do that you need first to increase the size of the parallel tasks, namely by increase the input size to values that justify the aforementioned overhead of the parallelism.