Creating a ratio between variables in a Stacked dataframe grouped by some Columns

64 Views Asked by At

I have a df as follows:

df_in 
      G1    G2       TPE           QC
      
      A     S1       td            2
      A     S1       ts            4
      A     S2       td            6
      A     S2       ts            3
      B     S1       td            20
      B     S1       ts            40
      B     S2       td            60
      B     S2       ts            30
      C     S1       td            90
      D     S2       ts            7
      

So the output should be grouped by Columns G1 & G2 and for each such group, do a row wise ratio for the column QC like (ts/td) where values are td and ts for the column TPE and rename the new variable in column TPE as ratio. It should also contain the original rows as it is. Also it should be noted that for the TPE column some groups may not have both ts and td values. In such cases there will be no ratio or the ratio should be kept as blank.

So the output should be this:

       df_out


         G1    G2       TPE           QC
      
         A     S1       td            2
         A     S1       ts            4
         A     S2       td            6
         A     S2       ts            3
         B     S1       td            20
         B     S1       ts            40
         B     S2       td            60
         B     S2       ts            30
         C     S1       td            90
         D     S2       ts            7
         A     S1       ratio         2
         A     S2       ratio         0.5
         B     S1       ratio         2
         B     S2       ratio        0.5
         C     S1       ratio         
         D     S2       ratio           
         
         

I tried the following, but its omitting the blank values for group C & D with blank ratios:

def calculate_ratio(group):
    td_row = group[group['TPE'] == 'td']
    ts_row = group[group['TPE'] == 'ts']
    if not td_row.empty and not ts_row.empty:
        ratio = ts_row['QC'].values[0] / td_row['QC'].values[0]
        return pd.DataFrame({'G1': [group['G1'].iloc[0]], 
                             'G2': [group['G2'].iloc[0]], 
                             'TPE': ['ratio'], 
                             'QC': [ratio]})
    return pd.DataFrame()


grouped = df_in.groupby(['G1', 'G2']).apply(calculate_ratio).reset_index(drop=True)

df_out = pd.concat([df_in, grouped], ignore_index=True)

Any help will be immensely appreciated.

3

There are 3 best solutions below

0
Panda Kim On

Code

tmp = df_in.set_index(['G1', 'G2', 'TPE']).unstack()['QC']
out = pd.concat([df_in, tmp['ts'].div(tmp['td']).reset_index(name='QC').assign(TPE='ratio')])

out:

    G1  G2  TPE     QC
0   A   S1  td      2.0
1   A   S1  ts      4.0
2   A   S2  td      6.0
3   A   S2  ts      3.0
4   B   S1  td      20.0
5   B   S1  ts      40.0
6   B   S2  td      60.0
7   B   S2  ts      30.0
8   C   S1  td      90.0
9   D   S2  ts      7.0
0   A   S1  ratio   2.0
1   A   S2  ratio   0.5
2   B   S1  ratio   2.0
3   B   S2  ratio   0.5
4   C   S1  ratio   NaN
5   D   S2  ratio   NaN

Intermedate

tmp:

    TPE td      ts
G1  G2      
A   S1  2.0     4.0
    S2  6.0     3.0
B   S1  20.0    40.0
    S2  60.0    30.0
C   S1  90.0    NaN
D   S2  NaN     7.0
0
PaulS On

Another possible solution, which uses multi-indexing, pandas.xs to separate the ts values from the td ones, and finally uses pandas.concat to concatenate the two dataframes:

s = df.set_index(['G1', 'G2', 'TPE'])

pd.concat([
    df, s.xs('ts', level=2).div(s.xs('td', level=2))
    .reset_index().assign(TPE='ratio')])

Output:

  G1  G2    TPE    QC
0  A  S1     td   2.0
1  A  S1     ts   4.0
2  A  S2     td   6.0
3  A  S2     ts   3.0
4  B  S1     td  20.0
5  B  S1     ts  40.0
6  B  S2     td  60.0
7  B  S2     ts  30.0
8  C  S1     td  90.0
9  D  S2     ts   7.0
0  A  S1  ratio   2.0
1  A  S2  ratio   0.5
2  B  S1  ratio   2.0
3  B  S2  ratio   0.5
4  C  S1  ratio   NaN
5  D  S2  ratio   NaN
0
Nick On

You could also use pivot and pipe to do this as a one-liner:

out = pd.concat([
    df_in, 
    df_in.pivot(index=['G1','G2'], columns='TPE', values='QC')
         .pipe(lambda df:df['ts'].div(df['td']))
         .reset_index(name='QC')
         .assign(TPE='ratio')
    ]
)

Output:

  G1  G2    TPE    QC
0  A  S1     td   2.0
1  A  S1     ts   4.0
2  A  S2     td   6.0
3  A  S2     ts   3.0
4  B  S1     td  20.0
5  B  S1     ts  40.0
6  B  S2     td  60.0
7  B  S2     ts  30.0
8  C  S1     td  90.0
9  D  S2     ts   7.0
0  A  S1  ratio   2.0
1  A  S2  ratio   0.5
2  B  S1  ratio   2.0
3  B  S2  ratio   0.5
4  C  S1  ratio   NaN
5  D  S2  ratio   NaN

You can add a fillna('') to the result if you want the NaN values to be empty strings rather than NaN.