Using SIMD to right shift 32 bit packed negative number

878 Views Asked by Isso At 15 August 2018 at 05:50

I'm writing some SSE/AVX code and there's a task to divide a packed signed 32 bit integers by 2's complement. When the values are positive this shift works fine, however it produces wrong results for negative values, because of shifting the sign bit.
Is there any SIMD operation that lets me shift preserving the position of the sign bit? Thanks

Original Q&A

There are 1 best solutions below

Peter Cordes On 15 August 2018 at 06:08 BEST ANSWER

SSE2/AVX2 has a choice of arithmetic¹ vs. logical right shifts for 16 and 32-bit element sizes. (For 64-bit elements, only logical is available until AVX512).

Use _mm_srai_epi32 (psrad) instead of _mm_srli_epi32 (psrld).

See Intel's intrinsics guide, and other links in the SSE tag wiki https://stackoverflow.com/tags/sse/info. (Filter it to exclude AVX512 if you want, because it's pretty cluttered these days with all the masked versions for all 3 sizes...)

Or just look at the asm instruction-set reference, which includes intrinsics for instructions that have them. Searching for "arithmetic" in http://felixcloutier.com/x86/index.html finds the shifts you want.

Note the a=arithmetic vs. l=logical, instead of the usual intrinsics naming scheme of epu32 for unsigned. The asm mnemonics are simple and consistent (e.g. Packed Shift Right Arithmetic Dword = psrad).

Arithmetic right shifts are also available for AVX2 variable-shifts (vpsravd, and for the one-variable-for-all-elements version of the immediate shifts.

Footnote 1:

Arithmetic right shifts shift in copies of the sign bit, instead of zero.

This correctly implement 2's complement signed division by powers of 2 with rounding towards negative infinity, unlike the truncation toward zero you get from C signed division. Look at the asm output for int foo(int a){return a/4;} to see how compilers implement signed division semantics in terms of shifts.

Using SIMD to right shift 32 bit packed negative number

There are 1 best solutions below

Related Questions in C

Related Questions in SSE

Related Questions in AVX

Related Questions in MMX

Trending Questions

Popular # Hahtags

Popular Questions