So I have this function that I have to parallelize with OpenMP static scheduling for n threads
void computeAccelerations(){
int i,j;
for(i=0;i<bodies;i++){
accelerations[i].x = 0; accelerations[i].y = 0; accelerations[i].z = 0;
for(j=0;j<bodies;j++){
if(i!=j){
//accelerations[i] = addVectors(accelerations[i],scaleVector(GravConstant*masses[j]/pow(mod(subtractVectors(positions[i],positions[j])),3),subtractVectors(positions[j],positions[i])));
vector sij = {positions[i].x-positions[j].x,positions[i].y-positions[j].y,positions[i].z-positions[j].z};
vector sji = {positions[j].x-positions[i].x,positions[j].y-positions[i].y,positions[j].z-positions[i].z};
double mod = sqrt(sij.x*sij.x + sij.y*sij.y + sij.z*sij.z);
double mod3 = mod * mod * mod;
double s = GravConstant*masses[j]/mod3;
vector S = {s*sji.x,s*sji.y,s*sji.z};
accelerations[i].x+=S.x;accelerations[i].y+=S.y;accelerations[i].z+=S.z;
}
}
}
}
I tried to do something like:
void computeAccelerations_static(int num_of_threads){
int i,j;
#pragma omp parallel for num_threads(num_of_threads) schedule(static)
for(i=0;i<bodies;i++){
accelerations[i].x = 0; accelerations[i].y = 0; accelerations[i].z = 0;
for(j=0;j<bodies;j++){
if(i!=j){
//accelerations[i] = addVectors(accelerations[i],scaleVector(GravConstant*masses[j]/pow(mod(subtractVectors(positions[i],positions[j])),3),subtractVectors(positions[j],positions[i])));
vector sij = {positions[i].x-positions[j].x,positions[i].y-positions[j].y,positions[i].z-positions[j].z};
vector sji = {positions[j].x-positions[i].x,positions[j].y-positions[i].y,positions[j].z-positions[i].z};
double mod = sqrt(sij.x*sij.x + sij.y*sij.y + sij.z*sij.z);
double mod3 = mod * mod * mod;
double s = GravConstant*masses[j]/mod3;
vector S = {s*sji.x,s*sji.y,s*sji.z};
accelerations[i].x+=S.x;accelerations[i].y+=S.y;accelerations[i].z+=S.z;
}
}
}
It comes naturally to just add the #pragma omp parallel for num_threads(num_of_threads) schedule(static) but it isn't correct.
I think there is some kind of false sharing with the accelerations[i] but I don't know how to approach it. I appreciate any kind of help. Thank you.
In your loop nest, only the iterations of the outer loop are parallelized. Because
iis the loop-control variable, each thread gets its own, private copy, but as a matter of style, it would be better to declareiin the loop control block.jis another matter. It is declared outside the parallel region and it is not the control variable of a parallelized loop. As a result, it is shared among the threads. Because each of the threads executingi-loop iterations manipulates shared variablej, you have a huge problem with data races. This would be resolved (among other alternatives) by moving the declaration ofjinto the parallel region, preferrably into the control block of its associated loop.Overall, then:
Note also that computing
sjiappears to be wasteful, as in mathematical terms it is just-sij, and neithersjinorsijis modified. I would probably reduce the above to something more like this: