1) Is there a way to efficiently implement sign function using SSE3 (no SSE4) with the following characteristics?
- the input is a float vector
__m128
. - the output should be also
__m128
with [-1.0f, 0.0f, 1.0f] as its values
I tried this, but it didn't work (though I think it should):
inputVal = _mm_set_ps(-0.5, 0.5, 0.0, 3.0);
comp1 = _mm_cmpgt_ps(_mm_setzero_ps(), inputVal);
comp2 = _mm_cmpgt_ps(inputVal, _mm_setzero_ps());
comp1 = _mm_castsi128_ps(_mm_castps_si128(comp1));
comp2 = _mm_castsi128_ps(_mm_castps_si128(comp2));
signVal = _mm_sub_ps(comp1, comp2);
2) Is there a way to create a "flag" function (I'm not sure about the right name). Namely, if A > B
the result will be 1
and 0
otherwise. The result should be floating-point (__m128
) just like its input.
UPDATE: It seems Cory Nelson answer will work here:
__m128 greatherThanFlag = _mm_and_ps(_mm_cmpgt_ps(valA, valB), _mm_set1_ps(1.0f));
__m128 lessThanFlag = _mm_and_ps(_mm_cmplt_ps(valA, valB), _mm_set1_ps(1.0f));
First that comes to mind is perhaps the simplest:
Or, if you misspoke and intended to get an integer result: