My problem is as follows. I am generating a random bitstring of size n, and need to iterate over the indices for which the random bit is 1. For example, if my random bitstring ends up being 00101, I want to retrieve [2, 4] (on which I will iterate over). The goal is to do so in the fastest way possible with Python/NumPy.
One of the fast methods is to use NumPy and do
bitstring = np.random.randint(2, size=(n,))
l = np.nonzero(bitstring)[0]
The advantage with np.non_zero
is that it finds indices of bits set to 1 much faster than if one iterates (with a for loop) over each bit and checks if it is set to 1.
Now, NumPy can generate a random bitstring faster via np.random.bit_generator.randbits(n)
. The problem is that it returns it as an integer, on which I cannot use np.nonzero
anymore. I saw that for integers one can get the count of bits set to 1 in an integer x by using x.bit_count()
, however there is no function to get the indices where bits are set to 1. So currently, I have to resort to a slow for
loop, hence losing the initial speedup given by np.random.bit_generator.randbits(n)
.
How would you do something similar to (and as fast as) np.non_zero
, but on integers instead?
Thank you in advance for your suggestions!
A minor optimisation to your code would be to use the new style random interface and generate
bool
s rather than 64bit integersthis causes it to take ~24 µs on my laptop, tested
n
upto 128.I've previously noticed that getting a Numpy to generate a permutation is particularly fast, hence my comment above. Leading to:
which takes between ~7 µs and ~10 µs depending on
n
. It also returns the indicies out of order, not sure if that's an issue for you. If yourn
isn't changing much, you could also swap to usingrng.shuffle
on an pre-allocated array, something like:which saves a couple of microseconds.