Given a 64 Bit general purpose register (Not a xmm register) in x64 architecture, filled with one byte unsigned values. How can I check it for a zero value simultaneously without using SSE instructions?
Is there a way to do so in a parallel way, without iterating over the register in 4 bit steps?
I tried to compare it with certain 64-bit masks but it is not working.
Technically, you could do something like that:
However, SIMD gonna be way faster. SSE is an absolute requirement on x64 architecture, all AMD64 processors in the world are required to support SSE1 and SSE2. Here’s SSE2 version:
That’s 6 instructions instead of 16: link.