0xFFFF flags in SSE

205 Views Asked by At

I would like to create an SSE register with values that I can store in an array of integers, from another SSE register which contains flags 0xFFFF and zeros. For example:

__m128i regComp = _mm_cmpgt_epi16(regA, regB);

For the sake of argument, lets assume that regComp was loaded with { 0, 0xFFFF, 0, 0xFFFF }. I would like to convert this into say { 0, 80, 0, 80 }.

What I had in mind was to create an array of integers, initialized to 80 and load them to a register regC. Then, do a _mm_and_si128 bewteen regC and regComp and store the result in regD. However, this does not do the trick, which led me to think that I do not understand the positive flags in SSE registers. Could someone answer the question with a brief explanation why my solution does not work?

short valA[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
short valB[16] = { 5, 5, 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 10, 10, 10 };
short ones[16] = { 1 };
short final[16];

__m128i vA, vB, vOnes, vRes, vRes2;

vOnes = _mm_load_si128((__m128i *)&(ones)[0] );

for( i=0 ; i < 16 ;i+=8){
   vA = _mm_load_si128((__m128i *)&(valA)[i] );
   vB = _mm_load_si128((__m128i *)&(valB)[i] );

   vRes = _mm_cmpgt_epi16(vA,vB);

   vRes2 = _mm_and_si128(vRes,vOnes);
   _mm_storeu_si128((__m128i *)&(final)[i], vRes2);
 }
2

There are 2 best solutions below

10
On BEST ANSWER

You only set the first element of array ones to 1 (the rest of the array is initialised to 0).

I suggest you get rid of the array ones altogether and then change this line:

vOnes = _mm_load_si128((__m128i *)&(ones)[0] );

to:

vOnes = _mm_set1_epi16(1);

Probably a better solution though, if you just want to convert SIMD TRUE (0xffff) results to 1, would be to use a shift:

for (i = 0; i < 16; i += 8) {
    vA = _mm_loadu_si128((__m128i *)&pA[i]);
    vB = _mm_loadu_si128((__m128i *)&pB[i]);

    vRes = _mm_cmpgt_epi16(vA, vB);    // generate 0xffff/0x0000 results

    vRes = _mm_srli_epi16(vRes, 15);   // convert to 1/0 results

    _mm_storeu_si128((__m128i *)&final[i], vRes2);
}
2
On

Try this for loading 1:

vOnes = _mm_set1_epi16(1);

This is shorter than creating a constant array.

Be careful, providing less array values than array size in C++ initializes the other values to zero. This was your error, and not the SSE part.

Don't forget the debugger, modern ones display SSE variables properly.