OpenMP parallelization with array elements

106 Views Asked by At

I've been playing around with OpenMP, and am trying to see if I can get a speedup in a particular bit of C++ code.

    #pragma omp parallel for
    for (Index j=alignedSize; j<size; ++j)
    {
      res[j] = cj.pmadd(lhs0(j), pfirst(ptmp0), res[j]);
      res[j] = cj.pmadd(lhs1(j), pfirst(ptmp1), res[j]);
      res[j] = cj.pmadd(lhs2(j), pfirst(ptmp2), res[j]);
      res[j] = cj.pmadd(lhs3(j), pfirst(ptmp3), res[j]);
    }

I'm a complete newbie with OpenMP so be gentle with me, but could someone shed some light on why this code ends up doubling the execution time rather than speeding it up?

I'm running with 4 cores, just in case that matters.

2

There are 2 best solutions below

1
On BEST ANSWER

What is the size of a res entry? If its less than the size of a cache line then its likely false sharing.

0
On

A bare minimum for typical cpu would be chunks of 128 bytes and then you would need unified last level cache.