I am vectorizing a piece of code and at some point I have the following setup:
register m128 a = { 99,99,99,99,99,99,99,99 }
register m128 b = { 100,50,119,30,99,40,50,20 }
I am currently packing short
s in these registers, which is why I have 8 values per register. What I would like to do is subtract the i'th element in b
with the corresponding value in a
if the i'th value of b
is greater than or equal to the value in a
(In this case, a
is filled with the constant 99 ). To this end, I first use a greater than or equal to operation between b
and a
, which yields, for this example:
register m128 c = { 1,0,1,0,1,0,0,0 }
To complete the operation, I'd like to use the multiply-and-subtract, i.e. to store in b
the operation b -= a*c
. The result would then be:
b = { 1,50,20,30,0,40,50,20 }
Is there any operation that does such thing? What I found were fused operations for Haswell, but I am currently working on Sandy-Bridge. Also, if someone has a better idea to do this, please let me know (e.g. I could do a logical subtract: if 1 in c
then I subtract, nothing otherwise.
You can copy
b
toc
, subtracta
fromc
, perform an arithmetic shift right by 15 positions in the 16 bit values, complement the value ofc
, maskc
witha
, and finally subtractc
fromb
.I'm not familiar for the intrinsics syntax, but the steps are:
here is an alternative with fewer steps: