I have asked a question for vclt_s8 comparation. Does anybody know how to use Neon intrinsics uint8x8_t vclt_s8 (int8x8_t, int8x8_t)
However, if we have such code:
if(a > b + c) {
a = b + c;
} else if(a < b - c) {
a = b - c;
}
How can I transform it to Neon intrinsics? It seems that we can not do 8 operator parallel operation in such case. Isn't it?
Obviously you can't do branching with SIMD, so you have to look at how do implement this kind of logic in a branchless way, using masks. I'll just give pseudo code, so you get the general idea - coding this should be fairly straightforward:
Note that I've cheated a little here and omitted the
else
from your scalar code (assuming that the two branches are mutually exclusive) so what I've implemented is actually equivalent to:If this is a bad assumption then you'll need to do some additional bitwise operations to implement the logical
else
.