Using xmm parameter in AVX intrinsics

1.3k Views Asked by At

Is it possible to use xmm register parameter with AVX intrinsics function (_mm256_**_**)?

My code require the usage of vecter integer operation (for load and storing data) along with vector floating point operation. The integer code is written with SSE2 intrinsics to be compatible with older CPU, while floating point is written with AVX to improve speed (there is also SSE code branch, so do not suggest this).

Currently, except for using compiler flag to automatically convert all SSE instructions to VEX-encoded version, are there any way using intrinsics function (i.e. no inline/external assembly) to force the use of VEX-encoded instruction on XMM register?

Note: I tried _mm256_castsi128_si256(), and this generates instruction with ymm operand.

1

There are 1 best solutions below

4
On

You have a processor with AVX. It does not have XMM registers in only has YMM registers. If you compile all your code with AVX support (e.g. with -mavx in GCC or /arch:AVX in MSVC) then all your SSE2 code operates on the lower 128-bits of the YMM registers. There is nothing to worry about.

However, let's say you have two different modules one you compiled with SSE2 support (e.g. with -msse2 in GCC or /arch:SSE2 in MSVC) and the other with AVX support and you use functions from both then you do have something to worry about when you switch between them. In that case you should call _mm256_zeroupper() or _mm256_zeroall() when you switch from AVX to SSE2 code unless you want to take a performance hit. Using AVX CPU instructions: Poor performance without "/arch:AVX"

The simple solutions is to just compile all your code with AVX support. The only reason I can think of to compile different modules with different instruction set support is if you want to make a CPU dispatcher so your code can run on different processors. That's a bit of a pain to implement. But then you don't do state changes so the only time I can think of you need to worry about a state change is when you call functions from a shared library which were compiled with another instruction set (e.g. a DLL compiled with SSE2). In that case you may need to call _mm256_zeroupper() or _mm256_zeroall() when calling the library function from AVX code.