When there is a simple loop running on simple arrays,
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
GCC and ICC behave differently with pragmas. So I experimented with pragmas and observed that ICC benefits from this:
#pragma vector always vectorlength(16)
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
and GCC benefits from this:
#pragma gcc ivdep
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
What is the right approach to support both compilers? Something like this:
#pragma vector always vectorlength(16)
#pragma gcc ivdep
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
or using define macros? (I'm not fond of macros but can use if no other option is left)
I'm trying to support #pragma omp simd safelen(16)
for platforms that do not have OpenMP. Closest pragmas I found are gcc ivdep and vector always but still they are not as fast as omp's pragma. Probably I'm missing some more pragmas.
- a,b and c are simple arrays in same stack and they are aligned to 64.
- the function has
__attribute__((always_inline))
which helps ICC for 4x performance (but still slower than GCC by 50%) - ICC flags:
-std=c++14 -xCORE-AVX512 -qopt-zmm-usage=high -O3 -lgomp -fmath-errno -mprefer-vector-width=512 -ftree-vectorize -lpthread
- GCC flags:
-std=c++14 -march=cascadelake -fmath-errno -mavx512f -O3 -lgomp -mprefer-vector-width=512 -ftree-vectorize -lpthread
Lastly, why is there no #pragma vector always
equivalent for GCC?