Using Pandas to stack a data frame in a specifc format

49 Views Asked by At

I have a panas dataframe as follows:

 df 

   Prod  ProdDesc   tot    avg   qtr        val_qtr
   A      Cyl       110   8.7    202301     12
   A      Cyl       110   8.7    202302     56.9
   A      Cyl       110   8.7    202303      9
   A      Cyl       110   8.7    202304      0

So what I want is to stack/transpose the dataframe. I used pandas melt,

    df_tra = df.melt(id_vars=['Prod', 'ProdDesc'], var_name='Attrib', value_name='Value')
    df_tra.drop_duplicates()

So my output comes as :

df_tra


    Prod  ProdDesc  Attrib   Value
     A    Cyl       tot      110           
     A    Cyl       avg      8.7           
     A    Cyl       quarter  202301    
     A    Cyl       quarter  202302
     A    Cyl       quarter  202303
     A    Cyl       quarter  202304        
     A    Cyl       val_qtr  12    
     A    Cyl       val_qtr  56.9
     A    Cyl       val_qtr  9
     A    Cyl       val_qtr  0  

but the output what I want/desire is different. What I want is the following:

 df_actual_wanted 

    Prod  ProdDesc  Attrib   Value
    A     Cyl       tot      110           
    A     Cyl       avg      8.7           
    A     Cyl       202301   12 
    A     Cyl       202302   56.9
    A     Cyl       202303    9
    A     Cyl       202304    0     

How can I achieve that?

1

There are 1 best solutions below

0
jezrael On

Select multiple columns with DataFrame.drop_duplicates and DataFrame.melt, and join with snoter subset with rename by concat, last if necessary sorting by both columns:

df1 = (df[['Prod','ProdDesc','tot','avg']]
               .drop_duplicates()
               .melt(id_vars=['Prod', 'ProdDesc'], var_name='Attrib', value_name='Value'))
df2 = (df[['Prod','ProdDesc','qtr','val_qtr']]
               .rename(columns={'qtr':'Attrib','val_qtr':'Value'}))

out = pd.concat([df1, df2]).sort_values(['Prod','ProdDesc'], ignore_index=True)
print (out)
  Prod ProdDesc  Attrib  Value
0    A      Cyl     tot  110.0
1    A      Cyl     avg    8.7
2    A      Cyl  202301   12.0
3    A      Cyl  202302   56.9
4    A      Cyl  202303    9.0
5    A      Cyl  202304    0.0

If default index and sorting need same like original change solution a bit:

print (df)
   Prod ProdDesc  tot   avg     qtr  val_qtr
0     A      Cyl  110  8.70  202301     12.0
1     A      Cyl  110  8.70  202302     56.9
2     A      Cyl  110  8.70  202303      9.0
3     A      Cyl  110  8.70  202304      0.0
4     B      Cyl  133  8.76  202301     12.0
5     B      Cyl  133  8.76  202302     56.9
6     B      Cyl  133  8.76  202303      9.0
7     B      Cyl  133  8.76  202304      0.0
8     A     Cyl1  117  8.37  202301     12.0
9     A     Cyl1  117  8.37  202302     56.9
10    A     Cyl1  117  8.37  202303      9.0
11    A     Cyl1  117  8.37  202304      0.0

df1 = (df[['Prod','ProdDesc','tot','avg']]
               .drop_duplicates()
               .melt(id_vars=['Prod', 'ProdDesc'], 
                     var_name='Attrib', 
                     value_name='Value',
                     ignore_index=False))
df2 = (df[['Prod','ProdDesc','qtr','val_qtr']]
               .rename(columns={'qtr':'Attrib','val_qtr':'Value'}))

out = pd.concat([df1, df2]).sort_index(kind='stable', ignore_index=True)

print (out)
   Prod ProdDesc  Attrib   Value
0     A      Cyl     tot  110.00
1     A      Cyl     avg    8.70
2     A      Cyl  202301   12.00
3     A      Cyl  202302   56.90
4     A      Cyl  202303    9.00
5     A      Cyl  202304    0.00
6     B      Cyl     tot  133.00
7     B      Cyl     avg    8.76
8     B      Cyl  202301   12.00
9     B      Cyl  202302   56.90
10    B      Cyl  202303    9.00
11    B      Cyl  202304    0.00
12    A     Cyl1     tot  117.00
13    A     Cyl1     avg    8.37
14    A     Cyl1  202301   12.00
15    A     Cyl1  202302   56.90
16    A     Cyl1  202303    9.00
17    A     Cyl1  202304    0.00

If small data or performance is not important:

def f(x):
    y = x[['tot','avg']].iloc[0].T.reset_index().set_axis(['Attrib', 'Value'], axis=1)
    return pd.concat([y, x[['Attrib','Value']]])

out = (df.rename(columns={'qtr':'Attrib','val_qtr':'Value'})
         .groupby(['Prod', 'ProdDesc'], sort=False)
         .apply(f)
         .droplevel(-1)
         .reset_index())
print (out)
   Prod ProdDesc  Attrib   Value
0     A      Cyl     tot  110.00
1     A      Cyl     avg    8.70
2     A      Cyl  202301   12.00
3     A      Cyl  202302   56.90
4     A      Cyl  202303    9.00
5     A      Cyl  202304    0.00
6     B      Cyl     tot  133.00
7     B      Cyl     avg    8.76
8     B      Cyl  202301   12.00
9     B      Cyl  202302   56.90
10    B      Cyl  202303    9.00
11    B      Cyl  202304    0.00
12    A     Cyl1     tot  117.00
13    A     Cyl1     avg    8.37
14    A     Cyl1  202301   12.00
15    A     Cyl1  202302   56.90
16    A     Cyl1  202303    9.00
17    A     Cyl1  202304    0.00