I'm trying to write a SWAR compare-for-equality operation, working on uint64_t pretending to be 8 'lanes' of uint8_t. The closest I've managed to achieve, based on techniques in Hacker's Delight and Bit Twiddling Hacks, is the following:
uint64_t compare_eq (uint64_t x, uint64_t y) {
    uint64_t xored = x ^ y;
    uint64_t mask = 0x7F * 0x0101010101010101ULL;
    uint64_t tmp = (xored & mask) + mask;
    return ~(tmp | xored | mask);
}
However, this puts 0x80 into 'lanes' which match, and 0x00 into 'lanes' that don't, whereas I want 0xFF in 'lanes' that match, and 0x00 in 'lanes' that don't. Is it possible to write this without branching?
 
                        
For the record, this is just a variant for calculating a high bit in nonzero bytes (one instruction less) put together with the comments from @njuffa and @Nate Eldredge (probably slightly more efficient than in 4386427's answer).