I don't understand groupby behavior in Pandas with categorical data

129 Views Asked by At

Case 1 : the groupby is adding 'missing' combinations of 'A' an 'B' ?

Case 2 : the result is the same lengh as the entry dataframe.

If 'A' and 'B' represent integers, I get the same result in both cases.

entry data

categorical

integers

1

There are 1 best solutions below

0
Corralien On

When your keys are category dtype, the output contains the product of all combinations even if groups are missing due to observed=False default setting of groupby method.

observed: bool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

If you use sample.groupby(['A', 'B'], observed=True)['C'].count().reset_index(), the output will be the same when A and B are integers.