pandas cut returns fewer bins

1.6k Views Asked by At

When using pd.cut(df['series'], 100) I get 33 unique bins.

pd.cut(df['series'], 100).nunique()
>>> 30

Why is that? I need 100 which I can only get if I cut for 500.

pd.cut(df['series'], 500).nunique()
>>> 100

Here is my data described, there are no missing values

df['series'].describe()

series
count  6406.0
mean   6.041080237277553
std    12.334466838167403
min    0.03
25%    1.22
50%    2.71
75%    5.76
max    272.19
1

There are 1 best solutions below

0
On BEST ANSWER

As per pandas documentation here, pd.cut with a number of bins specified will return a number of equal-width bins in the range of array-like (let's say Series) you're binning.

Which means, after creating these equal width bins, the data in that Series will be assigned to the respective bins. You should see that if there is no data in your Series that falls within certain bin, no data will be labeled as that, and that bin is not represented in your data sample.

For example, you have a Series containing [1,2,5,7,10] and do a pd.cut with 5 bins. These turn out to be (0.991,2.8] < (2.8,4.6] < (4.6,6.4] < (6.4,8.2] < (8.2, 10.0]. You can see that bin (2.8,4.6] will not be represented in the Series, since no value falls within it.

Therefore your binned data will only contain 4 unique bins.

If you do need specifically 100 bins (and don't care about them being equal-width), I'd suggest pd.qcut. So long your Series is longer than number of bins you need and you don't have many duplicating values, this should return the number of bins you specify.