I am working on getting a vector and matrix class parallelized and have run into an issue. Any time I have a loop in the form of
for (int i = 0; i < n; i++) b[i] += a[i] ;
the code has a data dependency and will not parallelize. When working with the intel compiler it is smart enough to handle this without any pragmas (I would like to avoid the pragma for no dependency check just due to the vast number of loops similar to this and because the cases are actually more complicated than this and I would like it to check just in case one does exist).
Does anyone know of a compiler flag for the PGI compiler that would allow this?
Thank you,
Justin
edit: Error in the for loop. Wasn't copy pasting an actual loop
I think the problem is you're not using the
restrict
keyword in these routines, so the C compiler has to worry about pointer aliasing.Compiling this program:
with the PGI compiler:
gives us the information that the
dbpa()
routine without the restrict keyword wasn't parallelized, but thedbpa_restict()
routine was.Really, for this sort of stuff, though, you're better off just using OpenMP (or TBB or ABB or...) rather than trying to convince the compiler to autoparallelize for you; probably better still is just to use existing linear algebra packages, either dense or sparse, depending on what you're doing.