Is there any way to perform a comparison like C >= (A + B) with SSE2/4.1 instructions considering 16 bit unsigned addition (_mm_add_epi16()
) can overflow?
The code snippet looks like-
#define _mm_cmpge_epu16(a, b) _mm_cmpeq_epi16(_mm_max_epu16(a, b), a)
__m128i *a = (__m128i *)&ptr1;
__m128i *b = (__m128i *)&ptr2;
__m128i *c = (__m128i *)&ptr3;
_m128i xa = _mm_lddqu_si128(a);
_m128i xb = _mm_lddqu_si128(b);
_m128i xc = _mm_lddqu_si128(c);
_m128i res = _mm_add_epi16(xa, xb);
_m128i xmm3 = _mm_cmpge_epu16(xc, res);
The issue is that when the 16 bit addition overflows (wraps-around), the greater than comparison results in false positives. I can't use saturated addition for my purpose. I have looked at mechanism to detect overflow for unsigned addition here SSE2 integer overflow checking. But how how do I use if for greater than comparision.
You build the missing primitives from what you have available in the instruction set. Here’s one possible implementation, untested. Disassembly.