Turning off x% of ACTIVE bits in a binary numpy matrix as opposed to turning off x% of ALL bits

49 Views Asked by At

I have a binary numpy matrix and I want to randomly turn off 30% of the matrix, where turning off 30% means to replace 30% of the 1s with 0s. In general, I want to do this many times, so if I do this 5 times I expect the final matrix to have 100*(1-0.3)^5 = 16% 1s of the original matrix, which is all 1s initially.

The important thing is that I want to turn off 30% of the active bits (the ones), as opposed to turning off 30% of the whole matrix (the ones and the zeros, where turning off zero is just zero).

I came up with a method to do this, but it doesn't seem to achieve the above, because, after 5 sessions of turning off 30%, the matrix is 23% 1s as opposed to 16% 1s.

To illustrate through example, my approach is as follows:

>>> mask=np.array([[1,1,1,1,1],[1,1,1,0,0],[1,1,0,0,0]])
>>> mask
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 0, 0],
       [1, 1, 0, 0, 0]])
>>> np.where(mask==0, np.zeros_like(mask), mask * np.random.binomial(1, 0.7, mask.shape))
array([[1, 1, 1, 0, 0],
       [0, 1, 1, 0, 0],
       [1, 1, 0, 0, 0]])

The code above gives a new matrix where if a bit is 0 it remains 0, and if it is 1 it is turned off 30% of the time.

In this small example, everything seems to work fine as I have removed exactly 30% of the ones (I had 10, now I have 7). But I don't think my approach generalizes well for a large matrix. I believe this is due to the following:

Although the Bernoulli trials are supposed to be independent of each other, numpy probably tries to ensure that overall, 30% of all the trials are Tails. But in my code "all trials" equals the size of the full matrix, and not the number of ones in my matrix, and this is what causes the problem.

What is a clean pythonic way to nullify 30% of the active bits, as opposed to 30% of all the bits?

1

There are 1 best solutions below

0
On

It seems I figured it out. Basically, I create a new zero matrix of the same size and I only edit the fields where the initial matrix has 1s, and the number of times the Bernoulli trial is performed is the number of 1s in the initial matrix, as opposed to the full size of the matrix, as desired.

>>> mask=np.array([[1,1,1,1,1],[1,1,1,0,0],[1,1,0,0,0]])
>>> new_mask = np.zeros_like(mask)
>>> new_mask[mask==1] = np.random.binomial(1,0.7,mask[mask==1].shape)* mask[mask==1]
>>> new_mask
array([[0, 1, 0, 1, 1],
       [1, 1, 1, 0, 0],
       [1, 0, 0, 0, 0]])

This works for large matrices and it does give me a final size of ~16% 1s (see question body for context).