Given a dataframe with three initial columns, one of which has categorical type with two categories, the aim is to create distinct columns for each category and subsequently store within them the values from columns 2 and 3.
With the following dataframe,
| col1 | col2 | col3 |
|---|---|---|
| 1 | e | 01 |
| 2 | f | 03 |
| 1 | g | 03 |
| 2 | h | 04 |
I need to reshape it to create new columns by col1 values and return the next dataframe:
| col2.1 | col2.2 | col3.1 | col3.2 |
|---|---|---|---|
| e | f | 01 | 02 |
| g | h | 03 | 04 |
Note that the new 'col' suffix represents the category of 'col1'.
Using pandas, I achieve that with the following code.
import pandas as pd
data = [
{'col1': 1, 'col2': 'e', 'col3': '01'},
{'col1': 2, 'col2': 'f', 'col3': '02'},
{'col1': 1, 'col2': 'g', 'col3': '03'},
{'col1': 2, 'col2': 'h', 'col3': '04'}
]
df = pd.DataFrame(data)
# Create a new dataframe with the desired format
new_df = pd.DataFrame({
'col2.1': df[df['col1'] == 1]['col2'].values,
'col2.2': df[df['col1'] == 2]['col2'].values,
'col3.1': df[df['col1'] == 1]['col3'].values,
'col3.2': df[df['col1'] == 2]['col3'].values
})
new_df
Is there a better way to do this?
Thanks a lot.
PD: The techniques shown on this post could be adapted to the task, but aggregation functions should be avoided.