pandas slice 3-level multiindex based on a list with 2 levels

56 Views Asked by At

Here is a minimal example:

import pandas as pd
import numpy as np
np.random.seed(0)
idx = pd.MultiIndex.from_product([[1,2,3], ['a', 'b', 'c'], [6, 7]])
df = pd.DataFrame(np.random.randn(18), index=idx)
selection = [(1, 'a'), (2, 'b')]

I would like to select all the rows in df that have as index that starts with any of the items in selection. So I would like to get the sub dataframe of df with the indices:

(1, 'a', 6), (1, 'a', 7), (2, 'b', 6), (2, 'b', 7)

What is the most straightforward/pythonian/pandasian way of doing this? What I found:

sel = [id[:2] in selection for id in df.index]
df.loc[sel]
3

There are 3 best solutions below

3
mozway On BEST ANSWER

You could use boolean indexing with isin:

out = df[df.index.isin(selection)]

Output:

              0
1 a 6  1.560268
    7  0.674709
2 b 6  0.848069
    7  0.130719

If you want to select other levels, drop the unused leading levels:

# here we want to select on levels 1 and 2
selection = [('a', 6), ('b', 7)]

df[df.index.droplevel(0).isin(selection)]

Output:

              0
1 a 6  1.560268
  b 7  0.137769
2 a 6  0.754946
  b 7  0.130719
3 a 6 -2.275646
  b 7 -2.199944
1
Andrej Kesely On

Try:

print(pd.concat([df.loc[(*t, slice(None)), :] for t in selection]))

Prints:

              0
1 a 6  0.701403
    7 -0.822042
2 b 6 -0.610558
    7 -0.433880
3
sammywemmy On

One option is with pyjanitor's select function:

# pip install pyjanitor
import janitor
df.select(rows=selection)
              0
1 a 6 -0.989212
    7 -1.951114
2 b 6  1.629402
    7  0.298262

This works because the first two rows of the tuples coincide with the first two levels of the index. @mozways' solution is more generic