I'm trying to use uint64_t
as if it was 8 lanes of uint8_t
s; my goal is to implement a lane-by-lane less-than. This operation, given x
and y
, should produce a result with 0xFF
in a lane if the value for the corresponding lane in x
is less than the value for that lane in y
, and 0x00
otherwise. A lane-by-lane less-than-or-equal would also work.
Based on what I've seen, I'm guessing I would need a lanewise difference-or-zero operation (defined as doz(x, y) = if (x < y) then 0 else (x - y)
), and then to use that to construct a selection mask. However, all the lane-wise subtraction approaches I've seen are signed, and I'm not sure how I would use them to do this kind of task.
Is there a way I could do this, using difference-or-zero or some other way?
I came up with
The logic goes pretty much the same path as in Harold's; the difference is in interpreting the top bits as
If one could work with inverted mask (i.e. b >= a), then the last
^H
can be omitted as well.The instruction count of the results using clang / godbolt for arm64 and x64 would be with and (without) sign_to_mask.