PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask.
How does one support this functionality on instructions sets predating sse4.2?
Update: This same question applies to ARMv7 with Neon which also lacks a 64-bit comparator. The sister question to this is found here: What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon?
We have 32-bit signed comparison intrinsics so split the packed qwords into dwords pairs.
If the high dword in
a
is greater than the high dword inb
then there is no need to compare the low dwords.If the high dword in
a
is equal to the high dword inb
then a 64-bit subtract will either clear or set all 32 high bits of the result (if the high dwords are equal then they "cancel" each other out, effectively a unsigned compare of the low dwords, placing the result in the high dwords).Copy the comparison mask in the high 32-bits to the low 32-bits.
Updated: Here's the Godbolt for SSE2 and ARMv7+Neon.