Using multiple pragma on same for-loop for auto-vectorization on GCC and ICC

377 Views Asked by At

When there is a simple loop running on simple arrays,

for(int i=0;i<16;i++)
{
       a[i]=b[i]+c[i];
}

GCC and ICC behave differently with pragmas. So I experimented with pragmas and observed that ICC benefits from this:

#pragma vector always vectorlength(16)
for(int i=0;i<16;i++)
{
       a[i]=b[i]+c[i];
}

and GCC benefits from this:

#pragma gcc ivdep
for(int i=0;i<16;i++)
{
       a[i]=b[i]+c[i];
}

What is the right approach to support both compilers? Something like this:

#pragma vector always vectorlength(16)
#pragma gcc ivdep
for(int i=0;i<16;i++)
{
       a[i]=b[i]+c[i];
}

or using define macros? (I'm not fond of macros but can use if no other option is left)

I'm trying to support #pragma omp simd safelen(16) for platforms that do not have OpenMP. Closest pragmas I found are gcc ivdep and vector always but still they are not as fast as omp's pragma. Probably I'm missing some more pragmas.

  • a,b and c are simple arrays in same stack and they are aligned to 64.
  • the function has __attribute__((always_inline)) which helps ICC for 4x performance (but still slower than GCC by 50%)
  • ICC flags: -std=c++14 -xCORE-AVX512 -qopt-zmm-usage=high -O3 -lgomp -fmath-errno -mprefer-vector-width=512 -ftree-vectorize -lpthread
  • GCC flags: -std=c++14 -march=cascadelake -fmath-errno -mavx512f -O3 -lgomp -mprefer-vector-width=512 -ftree-vectorize -lpthread

Lastly, why is there no #pragma vector always equivalent for GCC?

0

There are 0 best solutions below