Determine what intrinsic flag is activated

1k Views Asked by At

Before I elaborate the specifics, I have the following function,

Let _e, _w be an array of equal size. Let _stepSize be of float type.

void GradientDescent::backUpWeights(FLOAT tdError) {
  AI::FLOAT multiplier = _stepSize * tdError;
  for (UINT i = 0; i < n; i++){
    _w[i] += _e[i]*multiplier
  }

  // Assumed that the tilecode ensure that _w.size() or _e.size() is even.
}

This function is well and fine, but if a cpu have intrinsic, specifically for this example, SSE4, then function below allows me to shave seconds off (for the same input) even with -O3 gcc flag already included for both and extra -msse4a added for this one.

void GradientDescent::backUpWeights(FLOAT tdError) {
  AI::FLOAT multiplier = _stepSize * tdError;
  __m128d multSSE = _mm_set_pd(multiplier, multiplier);
  __m128d* eSSE = (__m128d*)_e;
  __m128d* wSSE = (__m128d*)_w;
  size_t n = getSize()>>1;
  for (UINT i = 0; i < n; i++){
    wSSE[i] = _mm_add_pd(wSSE[i],_mm_mul_pd(multSSE, eSSE[i]));
  }

  // Assumed that the tilecode ensure that _w.size() or _e.size() is even.
}

Problem:

My problem now is I want something like this,

void GradientDescent::backUpWeights(FLOAT tdError) {
AI::FLOAT multiplier = _stepSize * tdError;
#ifdef _mssa4a_defined_
  __m128d multSSE = _mm_set_pd(multiplier, multiplier);
  __m128d* eSSE = (__m128d*)_e;
  __m128d* wSSE = (__m128d*)_w;
  size_t n = getSize()>>1;
  for (UINT i = 0; i < n; i++){
    wSSE[i] = _mm_add_pd(wSSE[i],_mm_mul_pd(multSSE, eSSE[i]));
  }
#else  // No intrinsic
  for (UINT i = 0; i < n; i++){
    _w[i] += _e[i]*multiplier
  }
#endif

  // Assumed that the tilecode ensure that _w.size() or _e.size() is even.
}

Thus, if in gcc, I declared -msse4a to compile this code, then it will pick compile the code in if statement. And of course, my plan is to implement it for all intrinsic, not just for SSE4A above.

2

There are 2 best solutions below

0
On BEST ANSWER

GCC, ICC (on Linux), and Clang have the following compile options with corresponding defines

options       define
-mfma         __FMA__
-mavx2        __AVX2__
-mavx         __AVX__
-msse4.2      __SSE4_2__
-msse4.1      __SSE4_1__
-mssse3       __SSSE3__
-msse3        __SSE3__
-msse2        __SSE2__
-m64          __SSE2__
-msse         __SSE__    

Options and defines in GCC and Clang but not in ICC:

-msse4a       __SSE4A__
-mfma4        __FMA4__
-mxop         __XOP__

AVX512 options which are defined in recent versions of GCC, Clang, and ICC

-mavx512f     __AVX512F__    //foundation instructions
-mavx512pf    __AVX512PF__   //pre-fetch instructions
-mavx512er    __AVX512ER__   //exponential and reciprocal instructions 
-mavx512cd    __AVX512CD__   //conflict detection instructions

AVX512 options which will likely be in GCC, Clang, and ICC soon (if not already):

-mavx512bw    __AVX512BW__   //byte and word instructions
-mavx512dq    __AVX512DQ__   //doubleword and quadword Instructions
-mavx512vl    __AVX512VL__   //vector length extensions

Note that many of these switches enable several more: e.g -mfma enables and defines AVX2, AVX, SSE4.2 SSE4.1, SSSE3, SSE3, SSE2, SSE.

I'm not 100% what the compiler options with ICC for AVX512 is. It could be -xMIC-AVX512 instead of -mavx512f.

MSVC only appears to define __AVX__ and __AVX2__.

In your case your code appears only to be using SSE2 so if you compile in 64-bit mode (which is the default in a 64-bit user space or explicitly with -m64) then __SSE2__ is defined. But since you used -msse4a then __SSE4A__ will be defined as well.

Note that enabling an instruction is not the same as determine if an instruction set is available. If you want your code to work on multiple instruction sets then I suggest a CPU dispatcher.

0
On

I later learned that there is no way to do this. A simple and elegant way though is this. For x86-64 platform with sse4a intrinsic, do the following make target-rule (assuming that you store intrinsic sources in src/intrinsic/ and builds (.o files) in build/* ):

CXX=g++ -O3
CXXFLAGS=-std=c++14 -Wunused
CPPFLAGS=
CPP_INTRINSIC_FLAG:=-ffast-math

INTRINSIC_OBJECT := $(patsubst src/intrinsic/%.cpp,build/%.o,$(wildcard src/intrinsic/*.cpp))

x86-64-sse4: $(eval CPP_INTRINSIC_FLAG+=-msse4a -DSSE4A) $(INTRINSIC_OBJECT)

# Intrinsic objects
build/%.o: src/intrinsic/%.cpp
    $(CXX) $(CXXFLAGS) -c $(CPPFLAGS) $(CPP_INTRINSIC_FLAG) $(INCLUDE_PATHS) $^ -o $@