SSE intrinsics bit shifting to the right

6.3k Views Asked by At

I'm trying to bitshift integers to the right using intrinsics. The code below tries to do that but the output doesn't look as expected, maybe I'm loading the numbers incorrectly or using the wrong intrinsic function. Here's the output:

 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128 0
 512 1024 2048 4096 8192 16384 32768 0
 0 8192 0 16384
 8 0 16 0

I did try looking at this thread but that doesn't even try using the bitshift instructions with SSE intrinsics.

Here's the complete code (compile with SSE2 flag).

#include <emmintrin.h>
#include <stdio.h>
#include <stdint.h>

void print_16_num(__m128i var)
{
    uint8_t *val = (uint8_t*) &var;
    printf(" %i %i %i %i %i %i %i %i %i %i %i %i %i %i %i %i \n",
           val[0], val[1], val[2], val[3], val[4], val[5], val[6], val[7],val[8], val[9], val[10], val[11], val[12], val[13], val[14], val[15]);
}
void print_8_num( __m128i var)
{
    uint16_t *val = (uint16_t*) &var;
    printf(" %i %i %i %i %i %i %i %i \n",
           val[0], val[1], val[2], val[3], val[4], val[5], val[6], val[7]);
}
void print_4_num( __m128i var)
{
    uint16_t *val = (uint16_t*) &var;
    printf(" %i %i %i %i \n",
           val[0], val[1], val[2], val[3]);
}
int main()
{
    __m128i _16 = _mm_set_epi8( 128, 64, 32, 16, 8, 4, 2, 1, 128, 64, 32, 16, 8, 4, 2, 1);
    print_16_num(_mm_srli_si128(_16,1));

   __m128i _8 = _mm_set_epi16( 128, 64, 32, 16, 8, 4, 2, 1);
    print_8_num( _mm_srli_si128(_8,1));

    __m128i _4 = _mm_set_epi32( 128, 64, 32, 16);
    print_4_num( _mm_srli_si128(_4,1));

    _4 = _mm_set_epi32( 128, 64, 32, 16);
    print_4_num( _mm_srli_epi32(_4,1));

    return 0;
}
1

There are 1 best solutions below

1
On BEST ANSWER

When you use the _mm_set_epi* functions, they accept their parameters as the most significant item first.

For example, the first statement,

__m128i _16 = _mm_set_epi8( 128, 64, 32, 16, 8, 4, 2, 1, 128, 64, 32, 16, 8, 4, 2, 1);

will load the variable with this value:

0x80402010080402018040201008040201
 (128,64,32 ...)

Then you shift that 128-bit value right 1 byte with _mm_srli_si128(_16,1) and you get

0x00804020100804020180402010080402

When you read the individual byte values, byte[0] is the least significant byte, which would be the one farthest to the right. (so it prints 02 04 08 etc...)

Same thing goes for the other statements, although I think you want to cast to uint32_t* inside the print_4_num function rather than uint16_t*.

For the last one, _mm_srli_epi32(_4,1) will shift the value

0x00000080000000400000002000000010
       (128)   (64)    (32)    (16)

right once bit and it will become

0x00000040000000200000001000000008

But it will print "8 0 16 0" because you are reading 16-bit values and not 32-bit values in the print_4_num function:

0x0000 0040 0000 0020 0000 0010 0000 0008
     (not used)        [3]  [2]  [1]  [0]

For an easy reference to see what functions do what, check out the Intel Intrinsics Guide:

https://software.intel.com/sites/landingpage/IntrinsicsGuide/