After pivoting around two columns with a separate value column, I want a df with multiindex columns in a specific order, like so (please ignore that multi-2 and multi-3 labels are pointless in the simplified example):

multi-1       one       two
multi-2   multi-2   multi-2
multi-3  SomeText  SomeText
mIndex                     
bar     -1.788089 -0.631030
baz     -1.836282  0.762363
foo     -1.104848 -0.444981
qux     -0.484606 -0.507772

Starting with a multiindex series of values, labelled multi-2, I create a three column df: column 1 - the serie's indexes (multi-1); column 2 - the values (multi-2); plus another column (multi-3), which I really only want for the column label. I then want to pivot this df around multi-1 and multi-3, with values multi-2. PROBLEM: The multiindex column labels MUST always be in a specific order: multi-1, multi-2, then multi-3.

import pandas as pd
import numpy as np

arrays = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
          ["one", "two", "one", "two", "one", "two", "one", "two"]]

tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["mIndex", "multi-1"])

s = pd.Series(np.random.randn(8), index=index)
s.rename("multi-2", inplace=True)

df = pd.DataFrame(s.reset_index(level=["multi-1"]))
df["multi-3"] = "SomeText"

df = df.pivot(columns={"multi-1", "multi-3"}, values=["multi-2"])
df = df.swaplevel(0,1, axis=1)      # option 1: works only sometimes
# ???? how do I name the values level ????
df = df.reorder_levels("multi-1", "multi-2", "multi-3")  # option 2: set fixed order

Including multi-2 in the columns during the pivot creates another level.

The .swaplevel method does not always return the same order because (I guess) the original index order is not always the same following the pivot. Can this be right?!?

To use the reorder_levels, I need to somehow set an index label for the multi-2 value level (which is currently "None", along side "Multi-1" and "Multi-3").

Is there a way to set the label during the pivot? or after the pivot in a way that doesn't use the index (which seems to change somehow)? Or another way to get the same outcome?

1

There are 1 best solutions below

0
mozway On BEST ANSWER

After pivot, the values don't have an index name, you have to assign it:

(df.pivot(columns={'multi-1', 'multi-3'}, values=['multi-2'])
   .rename_axis(columns={None: 'multi-2'})
   .reorder_levels(['multi-1', 'multi-2', 'multi-3'], axis=1) 
)

Output:

multi-1       one       two
multi-2   multi-2   multi-2
multi-3  SomeText  SomeText
mIndex                     
bar      0.938079 -1.051440
baz      0.263281  1.388145
foo     -0.965295  0.611163
qux     -1.120318 -0.529974

Alternatively: swaplevel doesn't work consistently because you used a set (that is unordered) in pivot, use a list instead:

(df.pivot(columns=['multi-1', 'multi-3'], values=['multi-2'])
   .swaplevel(0, 1, axis=1)
)

NB. you can also add .rename_axis(columns={None: 'multi-2'}) if desired.

Output:

multi-1       one       two
          multi-2   multi-2
multi-3  SomeText  SomeText
mIndex                     
bar      0.542184 -0.199041
baz      1.253028 -1.006294
foo      0.252699 -1.728199
qux      0.572631 -0.694103

# with more columns
# columns=['multi-1', 'multi-3', 'multi-4', 'multi-5']

multi-1       one       two
          multi-2   multi-2
multi-3  SomeText  SomeText
multi-4  SomeText  SomeText
multi-5  SomeText  SomeText
mIndex                     
bar      0.071546  0.264463
baz      0.516355  1.594471
foo     -0.194536 -1.344563
qux     -0.197232 -0.845405