I used SIMD to do an arithmetic operation, the result is in a __m128i
variable which contains 4 x int32_t
.
I suspect the first two int32_t
values in the result are >=0 and the last two values are <=0. How could I quickly find out ?
__m128i result {int32_t, int32_t, int32_t, int32_t}
I suspect result {>=0,>=0,<=0,<=0}
What is the most efficient way of doing this?
It's unclear whether you want the result of this in an XMM register in preparation for some masking, or in a GPR register in preparation for, say, branching.
Alternative 1
This may be a more flexible alternative because it leaves a mask in an XMM register, and from there to the GPRs is just a PMOVMSKB away. It does however cost two 128-bit constants.
This is the simple approach: Compare for > -1 aka >= 0 on the top and give an impossible comparison on the bottom, then compare for < 1 aka <= 0 on the bottom and give an impossible comparison on the top. Logic-OR them together and you have your mask. If all bits are set, all the integers met their condition, so the test is true, else it's false.
Alternative 2
I've exploited PMOVMSKB on both the original value and its PSUBD negation, then checked the right bits of both returned bitmasks for the right value.
My explanation:
lt0
, from the integers. They represent the conditionresult[i] < 0
.gt0
, from the negations. They represent the conditionresult[i] > 0
with the exception of ifresult[i]
wasINT_MIN
.gt0 &= ~lt0
sets to 0 any false reports that -2147483648 is > 0).gt0
is 0. Impliesresult[0] <= 0
.gt0
is 0. Impliesresult[1] <= 0
.lt0
is 0. Impliesresult[2] >= 0
.lt0
is 0. Impliesresult[3] >= 0
.There is a reason why we look at bits 3, 7, 11 and 15, and a reason why we use the magic 8 and 0x88 constants. It is that PMOVMSKB returns one sign bit per byte, and not one sign bit per dword, so the bits we are actually interested in are surrounded with junk bits that we must ignore, with only the sign bit of the top byte of each integer interesting us.
In total this makes 9-10 instructions to run the check.