I'm trying to bitshift integers to the right using intrinsics. The code below tries to do that but the output doesn't look as expected, maybe I'm loading the numbers incorrectly or using the wrong intrinsic function. Here's the output:
2 4 8 16 32 64 128 1 2 4 8 16 32 64 128 0
512 1024 2048 4096 8192 16384 32768 0
0 8192 0 16384
8 0 16 0
I did try looking at this thread but that doesn't even try using the bitshift instructions with SSE intrinsics.
Here's the complete code (compile with SSE2 flag).
#include <emmintrin.h>
#include <stdio.h>
#include <stdint.h>
void print_16_num(__m128i var)
{
uint8_t *val = (uint8_t*) &var;
printf(" %i %i %i %i %i %i %i %i %i %i %i %i %i %i %i %i \n",
val[0], val[1], val[2], val[3], val[4], val[5], val[6], val[7],val[8], val[9], val[10], val[11], val[12], val[13], val[14], val[15]);
}
void print_8_num( __m128i var)
{
uint16_t *val = (uint16_t*) &var;
printf(" %i %i %i %i %i %i %i %i \n",
val[0], val[1], val[2], val[3], val[4], val[5], val[6], val[7]);
}
void print_4_num( __m128i var)
{
uint16_t *val = (uint16_t*) &var;
printf(" %i %i %i %i \n",
val[0], val[1], val[2], val[3]);
}
int main()
{
__m128i _16 = _mm_set_epi8( 128, 64, 32, 16, 8, 4, 2, 1, 128, 64, 32, 16, 8, 4, 2, 1);
print_16_num(_mm_srli_si128(_16,1));
__m128i _8 = _mm_set_epi16( 128, 64, 32, 16, 8, 4, 2, 1);
print_8_num( _mm_srli_si128(_8,1));
__m128i _4 = _mm_set_epi32( 128, 64, 32, 16);
print_4_num( _mm_srli_si128(_4,1));
_4 = _mm_set_epi32( 128, 64, 32, 16);
print_4_num( _mm_srli_epi32(_4,1));
return 0;
}
When you use the _mm_set_epi* functions, they accept their parameters as the most significant item first.
For example, the first statement,
__m128i _16 = _mm_set_epi8( 128, 64, 32, 16, 8, 4, 2, 1, 128, 64, 32, 16, 8, 4, 2, 1);
will load the variable with this value:
Then you shift that 128-bit value right 1 byte with
_mm_srli_si128(_16,1)
and you getWhen you read the individual byte values, byte[0] is the least significant byte, which would be the one farthest to the right. (so it prints 02 04 08 etc...)
Same thing goes for the other statements, although I think you want to cast to
uint32_t*
inside theprint_4_num
function rather thanuint16_t*
.For the last one,
_mm_srli_epi32(_4,1)
will shift the valueright once bit and it will become
But it will print "8 0 16 0" because you are reading 16-bit values and not 32-bit values in the
print_4_num
function:For an easy reference to see what functions do what, check out the Intel Intrinsics Guide:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/