I'm trying to calculate variability statistics from two df's - one with current data and one df with average data for the month. Suppose I have a df "DF1" that looks like this:
Name year month output
0 A 1991 1 10864.8
1 A 1997 2 11168.5
2 B 1994 1 6769.2
3 B 1998 2 3137.91
4 B 2002 3 4965.21
and a df called "DF2" that contains monthly averages from multiple years such as:
Name month output_average
0 A 1 11785.199
1 A 2 8973.991
2 B 1 8874.113
3 B 2 6132.176667
4 B 3 3018.768
and, i need a new DF calling it "DF3" that needs to look like this with the calculations specific to the change in the "name" column and for each "month" change:
Name year month Variability
0 A 1991 1 -0.078097875
1 A 1997 2 0.24454103
2 B 1994 1 -0.237197002
3 B 1998 2 -0.488287737
4 B 2002 3 0.644782
I have tried options like this below but with errors about duplicating the axis or key errors -
DF3['variability'] =
((DF1.output/DF2.set_index('month'['output_average'].reindex(DF1['name']).values)-1)
Thank you for your help in leaning Python row calculations coming from matlab!
For two columns, you can better use
merge
instead ofset_index
:Output: