Flattening Multi-Index Column with Date

31 Views Asked by At

I have a multi-index column dataframe that also has date. I would like the columns to be as rows.

For example:

| A | A | B | B |

| X | Y | X | Y | Date |


| 1 | 2 | 3 | 4 | May-08 |

And I need it to be

| Start | End | Value | Date |


| A | X | 1 | May-08 |

| A | Y | 2 | May-08 |

etc.

Sorry for the wonky formatting. When I tried to add code, the submission form freaked out and I couldn't figure out how to make it work.

Please let me know if you have any questions and I will answer them to the best of my ability

1

There are 1 best solutions below

0
mozway On

Assuming this input:

df = pd.DataFrame([[1,2,3,4,'May-08']],
                  columns=pd.MultiIndex.from_tuples([
                      ('A', 'X'), ('A', 'Y'),
                      ('B', 'X'), ('B', 'Y'),
                      (None, 'Date')
                  ]))

#    A     B        NaN
#    X  Y  X  Y    Date
# 0  1  2  3  4  May-08

You can melt with a bit of pre-processing of the MultiIndex to remove NaNs/None (that don't melt well), and post-processing of the resulting Index to remove the tuple:

# only necessary if you have None/NaN in the MultiIndex
df.columns = pd.MultiIndex.from_frame(df.columns.to_frame().fillna(''))

out = (df.melt([('', 'Date')], var_name=['Start', 'End'], value_name='Value')
         .rename(columns={('', 'Date'): 'Date'})
      )

If the empty levels in the MultiIndex are '', you can skip the pre-processing.

Output:

     Date Start End  Value
0  May-08     A   X      1
1  May-08     A   Y      2
2  May-08     B   X      3
3  May-08     B   Y      4