How to "bin" a numpy array using custom (non-linearly spaced) buckets?

519 Views Asked by At

How to "bin" the bellow array in numpy so that:

import numpy as np
bins = np.array([-0.1 , -0.07, -0.02,  0.  ,  0.02,  0.07,  0.1 ])
array = np.array([-0.21950869, -0.02854823,  0.22329239, -0.28073936, -0.15926265,
              -0.43688216,  0.03600587, -0.05101109, -0.24318651, -0.06727875])

That is replace each of the values in array with the following:

-0.1 where `value` < -0.085
-0.07 where -0.085 <= `value` < -0.045
-0.02 where -0.045 <= `value` < -0.01
0.0 where -0.01 <= `value` < 0.01
0.02 where 0.01 <= `value` < 0.045
0.07 where 0.045 <= `value` < 0.085
0.1 where `value` >= 0.085

The expected output would be:

array = np.array([-0.1, -0.02,  0.1, -0.1, -0.1, -0.1,  0.02, -0.07, -0.1, -0.07])

I recognise that numpy has a digitize function however it returns the index of the bin not the bin itself. That is:

np.digitize(array, bins)
np.array([0, 2, 7, 0, 0, 0, 5, 2, 0, 2])
1

There are 1 best solutions below

0
On BEST ANSWER

Get those mid-values by averaging across consecutive bin values in pairs. Then, use np.searchsorted or np.digitize to get the indices using the mid-values. Finally, index into bins for the output.

Mid-values :

mid_bins = (bins[1:] + bins[:-1])/2.0

Indices with searchsorted or digitze :

idx = np.searchsorted(mid_bins, array)
idx = np.digitize(array, mid_bins)

Output :

out = bins[idx]