PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask.
How does one support this functionality on instructions sets predating sse4.2?
Update: This same question applies to ARMv7 with Neon which also lacks a 64-bit comparator. The sister question to this is found here: What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon?
We have 32-bit signed comparison intrinsics so split the packed qwords into dwords pairs.
If the high dword in
ais greater than the high dword inbthen there is no need to compare the low dwords.If the high dword in
ais equal to the high dword inbthen a 64-bit subtract will either clear or set all 32 high bits of the result (if the high dwords are equal then they "cancel" each other out, effectively a unsigned compare of the low dwords, placing the result in the high dwords).Copy the comparison mask in the high 32-bits to the low 32-bits.
Updated: Here's the Godbolt for SSE2 and ARMv7+Neon.