DataFrame balancing by using reindex

483 Views Asked by At

I have a DataFrame with an index called _ItemId and a _ChannelId column, plus other value columns simplified below as _SomeValue. I would like to transform the DataFrame to have exactly the same _ItemId elements for each _Channel modality (which I refer to as "balancing", like in panel data).

In the example below, in the df that I want, 3 rows were inserted with Nan values to balance the DataFrame.

I thought I would use the reindex() function but I am uncertain on how to create the index I need to insert the missing elements. Any suggestions much appreciated.

What I have:

       _ChannelId  _SomeValue
_ItemId                 
6559085   MICRO    AA
6589413   MICRO    AB
6589421   MICRO    AA
6781144   MICRO    AC
8184089   MICRO    AA
6559085  WILSON    AA
6589413  WILSON    AC

What I want:

       _ChannelId  _SomeValue
_ItemId                 
6559085   MICRO    AA
6589413   MICRO    AB
6589421   MICRO    AA
6781144   MICRO    AC
8184089   MICRO    AA
6559085  WILSON    AA
6589413  WILSON    AC
6589421  WILSON    NaN
6781144  WILSON    NaN
8184089  WILSON    NaN

Thanks

1

There are 1 best solutions below

2
On BEST ANSWER

First get the indexes of the group you want to take as reference, for example:

idx = df.groupby('_ChannelId').groups['MICRO']

May be you want the union, unique indexes, indexes of the first group, etc, it is not clearly specified in the question. Then you can do:

df.groupby('_ChannelId')\
  .apply(lambda x: x.reindex(idx))\
  .drop('_ChannelId', axis=1)\
  .reset_index('_ChannelId')

        _ChannelId  _SomeValue
_ItemId     
6559085 MICRO      AA
6589413 MICRO      AB
6589421 MICRO      AA
6781144 MICRO      AC
8184089 MICRO      AA
6559085 WILSON     AA
6589413 WILSON     AC
6589421 WILSON     NaN
6781144 WILSON     NaN
8184089 WILSON     NaN