Is there a way to calculate weighted average from 2 different dataframes?

41 Views Asked by At

I have 2 dataframes - One with the values under each group. The second dataframe has weights for each group to get the weighted average.

df1 =

product     group1  group2  group3  group4  group5
xyz         223056  907858  4803    586623  835423 
abc         557042  382306  915624  786236  413817 
pqr         951348  723503  587397  141975  567826

df2 =

group    weight
group1   0.5
group2   7.5
group3   10
group4   1.5
group5   50

I'd like to get a weighted average of all groups for each product and append it to df1. For example :

product     group1  group2  group3  group4  group5 weighted-average
xyz         223056  907858  4803    586623  835423   40
abc         557042  382306  915624  786236  413817   25
pqr         951348  723503  587397  141975  567826   30

Is there an easy way to do this? I managed to do this in excel, but I would like to try it in pandas as I'm learning it still.

2

There are 2 best solutions below

0
jezrael On BEST ANSWER

Use np.average with DataFrame.reindex for same order columns and index:

s = df2.set_index('group')['weight']
df1['wa'] = np.average(df1.reindex(columns=s.index), weights=s, axis=1)

print (df1)
  product  group1  group2  group3  group4  group5             wa
0     xyz  223056  907858    4803  586623  835423  713950.755396
1     abc  557042  382306  915624  786236  413817  491687.194245
2     pqr  951348  723503  587397  141975  567826  581009.769784

If always same order is possible use:

df1 = df1.set_index('product')
df1['wa'] = np.average(df1, weights=df2['weight'], axis=1)

print (df1)
         group1  group2  group3  group4  group5             wa
product                                                       
xyz      223056  907858    4803  586623  835423  713950.755396
abc      557042  382306  915624  786236  413817  491687.194245
pqr      951348  723503  587397  141975  567826  581009.769784
0
mozway On

You can multiply by the weight, sum and divide by the sum of weights:

df1['wa'] = (df1.mul(df2.set_index('group')['weight'])
                .sum(axis=1).div(df2['weight'].sum())
             )

output:

  product  group1  group2  group3  group4  group5             wa
0     xyz  223056  907858    4803  586623  835423  713950.755396
1     abc  557042  382306  915624  786236  413817  491687.194245
2     pqr  951348  723503  587397  141975  567826  581009.769784