How to write a SWAR comparison which puts 0xFF in a lane on matches?

210 Views Asked by At

I'm trying to write a SWAR compare-for-equality operation, working on uint64_t pretending to be 8 'lanes' of uint8_t. The closest I've managed to achieve, based on techniques in Hacker's Delight and Bit Twiddling Hacks, is the following:

uint64_t compare_eq (uint64_t x, uint64_t y) {
    uint64_t xored = x ^ y;
    uint64_t mask = 0x7F * 0x0101010101010101ULL;
    uint64_t tmp = (xored & mask) + mask;
    return ~(tmp | xored | mask);
}

However, this puts 0x80 into 'lanes' which match, and 0x00 into 'lanes' that don't, whereas I want 0xFF in 'lanes' that match, and 0x00 in 'lanes' that don't. Is it possible to write this without branching?

2

There are 2 best solutions below

3
On BEST ANSWER

For the record, this is just a variant for calculating a high bit in nonzero bytes (one instruction less) put together with the comments from @njuffa and @Nate Eldredge (probably slightly more efficient than in 4386427's answer).

uint64_t compare_eq (uint64_t x, uint64_t y) {
    uint64_t xored = x ^ y;
    uint64_t mask = ((((xored >> 1) | 0x8080808080808080) - xored) & 0x8080808080808080);
    return (mask << 1) - (mask >> 7);
}
0
On

To start with there is a bug (a typo?) in the posted code:

uint64_t mask = 0x7F * 0x0101010101010101ULL;
                       ^^
                    Missing 0x

Once you have either 0x80 or 0x00 in the lanes, you can divide by 0x80 and multiply by 0xff.

Like:

uint64_t compare_eq (uint64_t x, uint64_t y) {
    uint64_t xored = x ^ y;
    uint64_t mask = 0x7F * 0x0101010101010101ULL;
    uint64_t tmp = (xored & mask) + mask;
    uint64_t res = ~(tmp | xored | mask);
    res = res / 0x80;
    res = res * 0xff;
    return res;
}