I am trying to figure out how to use sse _mm_shuffle_epi8 to compact a 128-bit register.
Let's say, I have an input variable
__m128i target
which is basically 8 16-bits, indicated as:
a[0], a[1] ... a[7]. // each slot is 16 bits
my output is called:
__m128i output
Now I have a bit-vector of size 8:
char bit_mask // 8 bits, i-th bit each indicate if
// the corresponding a[i] should be included
OK, how can I get the final result based on the bit_mask and the input target?
assume my bitvector is:
[0 1 1 0 0 0 0 0]
then I want to result to be:
output = [a1, a2 , ... ]
Any known way to do this using _mm_shuffle_epi8?
Assume I use a lookup array: _mm_shuffle_epi8(a, mask_lookup[bitvector]);
How do I create the array?
Simple and very fast, but requires 4KB of table space:
where you simply store all 256 possible shuffle masks in a table indexed by the bitvector.