SIMD SSE2 __m128i contains 4 int32_t how to quickly find each integer that bigger or small than 0

814 Views Asked by At

I used SIMD to do an arithmetic operation, the result is in a __m128i variable which contains 4 x int32_t. I suspect the first two int32_t values in the result are >=0 and the last two values are <=0. How could I quickly find out ?

__m128i result {int32_t, int32_t, int32_t, int32_t}

I suspect result {>=0,>=0,<=0,<=0}

What is the most efficient way of doing this?

1

There are 1 best solutions below

0
On

It's unclear whether you want the result of this in an XMM register in preparation for some masking, or in a GPR register in preparation for, say, branching.

Alternative 1

This may be a more flexible alternative because it leaves a mask in an XMM register, and from there to the GPRs is just a PMOVMSKB away. It does however cost two 128-bit constants.

This is the simple approach: Compare for > -1 aka >= 0 on the top and give an impossible comparison on the bottom, then compare for < 1 aka <= 0 on the bottom and give an impossible comparison on the top. Logic-OR them together and you have your mask. If all bits are set, all the integers met their condition, so the test is true, else it's false.

__m128i result;
/* ... */
__m128i TOP  = _mm_set_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF);
__m128i BOT  = _mm_set_epi32(0x80000000, 0x80000000, 0x00000001, 0x00000001);
__m128i cmpT = _mm_cmpgt_epi32(result, TOP);//Top    > -1   Bottom > INT_MAX
__m128i cmpB = _mm_cmpgt_epi32(BOT, result);//Bottom <  1,  Top    < INT_MIN
__m128i cmp  = _mm_or_si128(cmpT, cmpB);
int cond     = _mm_movemask_epi8(cmp) == 0xFFFF;
/* cond contains the result of the comparison:
      0 if check failed and
      1 if check satisfied.                    */

Alternative 2

I've exploited PMOVMSKB on both the original value and its PSUBD negation, then checked the right bits of both returned bitmasks for the right value.

__m128i result;
/* ... */
__m128i ZERO = _mm_setzero_si128();            /* 0 constant */
__m128i neg  = _mm_sub_epi32(ZERO, result);    /* Negate */
int lt0      = _mm_movemask_epi8(result);      /* < 0 ? */
int gt0      = _mm_movemask_epi8(neg);         /* > 0 ? */
gt0         &= ~lt0;                           /* Correction for INT_MIN. Can be
                                                  deleted if never encountered. */
int cond     = !((gt0 | (lt0 >> 8)) & 0x88);   /* Check both bits 3 and 7 are 0 */
/* cond contains the result of the comparison:
      0 if check failed and
      1 if check satisfied.                    */

My explanation:

  • I negate the integers.
  • I extract the sign bits, lt0, from the integers. They represent the condition result[i] < 0.
  • I extract the sign bits, gt0, from the negations. They represent the condition result[i] > 0 with the exception of if result[i] was INT_MIN.
    • Optional: I correct that case by detecting it and correcting it (gt0 &= ~lt0 sets to 0 any false reports that -2147483648 is > 0).
  • I then check whether all of the following holds:
    • Bit 3 of gt0 is 0. Implies result[0] <= 0.
    • Bit 7 of gt0 is 0. Implies result[1] <= 0.
    • Bit 11 of lt0 is 0. Implies result[2] >= 0.
    • Bit 15 of lt0 is 0. Implies result[3] >= 0.

There is a reason why we look at bits 3, 7, 11 and 15, and a reason why we use the magic 8 and 0x88 constants. It is that PMOVMSKB returns one sign bit per byte, and not one sign bit per dword, so the bits we are actually interested in are surrounded with junk bits that we must ignore, with only the sign bit of the top byte of each integer interesting us.

In total this makes 9-10 instructions to run the check.