Efficiently find indices of 1-bits in large array, using SIMD

400 Views Asked by At

If I have very large array of bytes and want to find indices of all 1-bits, indices counting from leftmost bit, how do I do this efficiently, probably using SIMD.

(For finding the first 1-bit, see an earlier question. This question produces an array of outputs instead of 1 index.)


Of course I can do following 512-bit non-SIMD version using C++20:
Try it online!

#include <cstdint>
#include <iostream>
#include <bit>

int Find1s512(uint64_t const * p, uint16_t * idxs) {
    int rpos = 0;
    for (int i = 0; i < 8; ++i) {
        uint64_t n = p[i];
        while (true) {
            int const j = std::countr_zero(n);
            if (j >= 64)
                break;
            idxs[rpos++] = i * 64 + j;
            n &= n - 1;
        }
    }
    return rpos;
}

int main() {
    uint64_t a[8] = {(1ULL << 17) | (1ULL << 63),
        1ULL << 19, 1ULL << 23};
    uint16_t b[512] = {};
    int const cnt = Find1s512(a, b);
    for (int i = 0; i < cnt; ++i)
        std::cout << int(b[i]) << " ";
    // Printed result: 17 63 83 151
}

And use above 512-bit version as building block to collect 1-bit positions of whole large array.

But I'd like to find out what is the most efficient way to do this, especially using SIMD 128/256/512.

0

There are 0 best solutions below