Given a multiindex columns
a ...
E1 ... E3
g1 g2 g3 ... g1 g2 g3
0 0.548814 0.715189 0.602763 ... 0.437587 0.891773 0.963663
1 0.383442 0.791725 0.528895 ... 0.087129 0.020218 0.832620
2 0.778157 0.870012 0.978618 ... 0.118274 0.639921 0.143353
3 0.944669 0.521848 0.414662 ... 0.568434 0.018790 0.617635
4 0.612096 0.616934 0.943748 ... 0.697631 0.060225 0.666767
5 0.670638 0.210383 0.128926 ... 0.438602 0.988374 0.102045
6 0.208877 0.161310 0.653108 ... 0.158970 0.110375 0.656330
7 0.138183 0.196582 0.368725 ... 0.096098 0.976459 0.468651
8 0.976761 0.604846 0.739264 ... 0.296140 0.118728 0.317983
9 0.414263 0.064147 0.692472 ... 0.093941 0.575946 0.929296
[10 rows x 9 columns]
I would like to apply multiple column filtered by the second level (i.e., E1
, E2
, E3
) to a functions (e.g., ration_type1
,ration_type2
, or can be more in actual implementation).
For example. Assume we are to compute the second level of E1
under the function ration_type1
and ration_type2
. Then we only process the following df
a
E1
g1 g2 g3
0 0.548814 0.715189 0.602763
1 0.383442 0.791725 0.528895
.................
8 0.976761 0.604846 0.739264
9 0.414263 0.064147 0.692472
To generalise all second level, I than rely on the list comprehension as below
for each of the ration_type1
and ration_type2
.
all_df1 = [ration_type1(df.loc[:, (slice(None), dgroup, slice(None))]) for dgroup in [`E1`, `E2`, `E3`]]
all_df2 = [ration_type2(df.loc[:, (slice(None), dgroup, slice(None))]) for dgroup in [`E1`, `E2`, `E3`]]
before concat it back to the orignal df
.
However, I wonder whether there is more elegant and compact way than the list comprehension
approach. This is because, in real life implementation, there can be more ration function.
The full code is as below
import numpy as np
import pandas as pd
np.random.seed(0)
arr = np.random.rand(10,9)
tuples = [('a', 'E1', 'g1'), ('a', 'E1', 'g2'), ('a', 'E1', 'g3'), ('a', 'E2', 'g1'), ('a', 'E2', 'g2'),
('a', 'E2', 'g3'), ('a', 'E3', 'g1'), ('a', 'E3', 'g2'), ('a', 'E3', 'g3')]
df = pd.DataFrame(data=arr, columns=pd.MultiIndex.from_tuples(tuples))
print(df)
def ration_type1(df):
"""
(g3+g2)/g1
# Ugly way since have to convert to numpy 1st
"""
print(df)
dration = 'ration_type1'
l1, l2, _ = df.columns.tolist()[0]
total = df.loc[:, (slice(None), slice(None), 'g2')].to_numpy() + \
df.loc[:, (slice(None), slice(None), 'g3')].to_numpy()
arr = total / df.loc[:, (slice(None), slice(None), 'g1')].to_numpy()
return pd.DataFrame(data=arr, columns=pd.MultiIndex.from_tuples([(l1, l2, dration)]))
def ration_type2(df):
"""
(g3+g2+g1)/g1
# Ugly way since have to convert to numpy 1st
"""
dration = 'ration_type2'
l1, l2, _ = df.columns.tolist()[0]
total = df.loc[:, (slice(None), slice(None), 'g1')].to_numpy() + \
df.loc[:, (slice(None), slice(None), 'g2')].to_numpy() + \
df.loc[:, (slice(None), slice(None), 'g3')].to_numpy()
arr = total / df.loc[:, (slice(None), slice(None), 'g1')].to_numpy()
return pd.DataFrame(data=arr, columns=pd.MultiIndex.from_tuples([(l1, l2, dration)]))
level1_name = list(set(df.columns.get_level_values(1)))
all_df1 = [ration_type1(df.loc[:, (slice(None), dgroup, slice(None))]) for dgroup in level1_name]
all_df2 = [ration_type2(df.loc[:, (slice(None), dgroup, slice(None))]) for dgroup in level1_name]
df1 = pd.concat(all_df1, axis=1)
df2 = pd.concat(all_df2, axis=1)
df=pd.concat([df,df1,df2],axis=1)
Expected output.
a ...
E1 ... E2 E3
g1 g2 g3 ... ration_type2 ration_type2 ration_type2
0 0.548814 0.715189 0.602763 ... 3.401458 2.962896 5.240151
1 0.383442 0.791725 0.528895 ... 4.444124 2.754497 10.788191
2 0.778157 0.870012 0.978618 ... 3.375653 2.554145 7.622516
3 0.944669 0.521848 0.414662 ... 1.991363 5.650758 2.119612
4 0.612096 0.616934 0.943748 ... 3.549735 2.168255 2.042087
5 0.670638 0.210383 0.128926 ... 1.505949 3.960760 3.486126
6 0.208877 0.161310 0.653108 ... 4.899035 3.806001 5.822965
7 0.138183 0.196582 0.368725 ... 5.091008 2.138921 16.037821
8 0.976761 0.604846 0.739264 ... 2.376088 11.283905 2.474676
9 0.414263 0.064147 0.692472 ... 2.826423 2.391873 17.023361
[10 rows x 15 columns]
I am thinking to do something like using apply
# function for prepending 'Geek'
def multiply_by_2(number):
return 2 * number
# executing the function
df[["Integers", "Float"]] = df[["Integers", "Float"]].apply(multiply_by_2)
But, I am having difficulties (due to my limited knowledge)to do it since my example involving multiindex columns
Not so easy if working with
MultiIndex
- solution filter levels with renameg
values toration_type1, ration_type2
for possible divideMultiIndex
DataFrames:Simplier is reshape first:
Last reshape to original
MultiIndex
: