Creating a ratio between variables in a Stacked dataframe grouped by some Columns

Question

Creating a ratio between variables in a Stacked dataframe grouped by some Columns

64 Views Asked by Stan At 13 December 2023 at 23:08

I have a df as follows:

df_in 
      G1    G2       TPE           QC
      
      A     S1       td            2
      A     S1       ts            4
      A     S2       td            6
      A     S2       ts            3
      B     S1       td            20
      B     S1       ts            40
      B     S2       td            60
      B     S2       ts            30
      C     S1       td            90
      D     S2       ts            7

So the output should be grouped by Columns G1 & G2 and for each such group, do a row wise ratio for the column QC like (ts/td) where values are td and ts for the column TPE and rename the new variable in column TPE as ratio. It should also contain the original rows as it is. Also it should be noted that for the TPE column some groups may not have both ts and td values. In such cases there will be no ratio or the ratio should be kept as blank.

So the output should be this:

       df_out


         G1    G2       TPE           QC
      
         A     S1       td            2
         A     S1       ts            4
         A     S2       td            6
         A     S2       ts            3
         B     S1       td            20
         B     S1       ts            40
         B     S2       td            60
         B     S2       ts            30
         C     S1       td            90
         D     S2       ts            7
         A     S1       ratio         2
         A     S2       ratio         0.5
         B     S1       ratio         2
         B     S2       ratio        0.5
         C     S1       ratio         
         D     S2       ratio

I tried the following, but its omitting the blank values for group C & D with blank ratios:

def calculate_ratio(group):
    td_row = group[group['TPE'] == 'td']
    ts_row = group[group['TPE'] == 'ts']
    if not td_row.empty and not ts_row.empty:
        ratio = ts_row['QC'].values[0] / td_row['QC'].values[0]
        return pd.DataFrame({'G1': [group['G1'].iloc[0]], 
                             'G2': [group['G2'].iloc[0]], 
                             'TPE': ['ratio'], 
                             'QC': [ratio]})
    return pd.DataFrame()


grouped = df_in.groupby(['G1', 'G2']).apply(calculate_ratio).reset_index(drop=True)

df_out = pd.concat([df_in, grouped], ignore_index=True)

Any help will be immensely appreciated.

Original Q&A

There are 3 best solutions below

**Panda Kim** · Answer 1 · 2023-12-13T23:32:57.090000

Code

tmp = df_in.set_index(['G1', 'G2', 'TPE']).unstack()['QC']
out = pd.concat([df_in, tmp['ts'].div(tmp['td']).reset_index(name='QC').assign(TPE='ratio')])

out:

    G1  G2  TPE     QC
0   A   S1  td      2.0
1   A   S1  ts      4.0
2   A   S2  td      6.0
3   A   S2  ts      3.0
4   B   S1  td      20.0
5   B   S1  ts      40.0
6   B   S2  td      60.0
7   B   S2  ts      30.0
8   C   S1  td      90.0
9   D   S2  ts      7.0
0   A   S1  ratio   2.0
1   A   S2  ratio   0.5
2   B   S1  ratio   2.0
3   B   S2  ratio   0.5
4   C   S1  ratio   NaN
5   D   S2  ratio   NaN

Intermedate

tmp:

    TPE td      ts
G1  G2      
A   S1  2.0     4.0
    S2  6.0     3.0
B   S1  20.0    40.0
    S2  60.0    30.0
C   S1  90.0    NaN
D   S2  NaN     7.0

**PaulS** · Answer 2 · 2023-12-14T00:05:49.730000

Another possible solution, which uses multi-indexing, pandas.xs to separate the ts values from the td ones, and finally uses pandas.concat to concatenate the two dataframes:

s = df.set_index(['G1', 'G2', 'TPE'])

pd.concat([
    df, s.xs('ts', level=2).div(s.xs('td', level=2))
    .reset_index().assign(TPE='ratio')])

Output:

  G1  G2    TPE    QC
0  A  S1     td   2.0
1  A  S1     ts   4.0
2  A  S2     td   6.0
3  A  S2     ts   3.0
4  B  S1     td  20.0
5  B  S1     ts  40.0
6  B  S2     td  60.0
7  B  S2     ts  30.0
8  C  S1     td  90.0
9  D  S2     ts   7.0
0  A  S1  ratio   2.0
1  A  S2  ratio   0.5
2  B  S1  ratio   2.0
3  B  S2  ratio   0.5
4  C  S1  ratio   NaN
5  D  S2  ratio   NaN

**Nick** · Answer 3 · 2023-12-14T06:38:16.600000

You could also use pivot and pipe to do this as a one-liner:

out = pd.concat([
    df_in, 
    df_in.pivot(index=['G1','G2'], columns='TPE', values='QC')
         .pipe(lambda df:df['ts'].div(df['td']))
         .reset_index(name='QC')
         .assign(TPE='ratio')
    ]
)

Output:

  G1  G2    TPE    QC
0  A  S1     td   2.0
1  A  S1     ts   4.0
2  A  S2     td   6.0
3  A  S2     ts   3.0
4  B  S1     td  20.0
5  B  S1     ts  40.0
6  B  S2     td  60.0
7  B  S2     ts  30.0
8  C  S1     td  90.0
9  D  S2     ts   7.0
0  A  S1  ratio   2.0
1  A  S2  ratio   0.5
2  B  S1  ratio   2.0
3  B  S2  ratio   0.5
4  C  S1  ratio   NaN
5  D  S2  ratio   NaN

You can add a fillna('') to the result if you want the NaN values to be empty strings rather than NaN.

Creating a ratio between variables in a Stacked dataframe grouped by some Columns

There are 3 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in GROUP-BY

Related Questions in MULTIPLE-COLUMNS

Related Questions in GROUP-CONCAT

Trending Questions

Popular # Hahtags

Popular Questions