I'm trying to use uint64_t as if it was 8 lanes of uint8_ts; my goal is to implement a lane-by-lane less-than. This operation, given x and y, should produce a result with 0xFF in a lane if the value for the corresponding lane in x is less than the value for that lane in y, and 0x00 otherwise. A lane-by-lane less-than-or-equal would also work.
Based on what I've seen, I'm guessing I would need a lanewise difference-or-zero operation (defined as doz(x, y) = if (x < y) then 0 else (x - y)), and then to use that to construct a selection mask. However, all the lane-wise subtraction approaches I've seen are signed, and I'm not sure how I would use them to do this kind of task.
Is there a way I could do this, using difference-or-zero or some other way?
 
                        
I came up with
The logic goes pretty much the same path as in Harold's; the difference is in interpreting the top bits as
If one could work with inverted mask (i.e. b >= a), then the last
^Hcan be omitted as well.The instruction count of the results using clang / godbolt for arm64 and x64 would be with and (without) sign_to_mask.