Is there a more effective way to create new columns from row values in a Pandas DataFrame?

43 Views Asked by At

Given a dataframe with three initial columns, one of which has categorical type with two categories, the aim is to create distinct columns for each category and subsequently store within them the values from columns 2 and 3.

With the following dataframe,

col1 col2 col3
1 e 01
2 f 03
1 g 03
2 h 04

I need to reshape it to create new columns by col1 values and return the next dataframe:

col2.1 col2.2 col3.1 col3.2
e f 01 02
g h 03 04

Note that the new 'col' suffix represents the category of 'col1'.

Using pandas, I achieve that with the following code.

import pandas as pd

data = [
{'col1': 1, 'col2': 'e', 'col3': '01'}, 
{'col1': 2, 'col2': 'f', 'col3': '02'},
{'col1': 1, 'col2': 'g', 'col3': '03'},
{'col1': 2, 'col2': 'h', 'col3': '04'}
]

df = pd.DataFrame(data)

# Create a new dataframe with the desired format
new_df = pd.DataFrame({
    'col2.1': df[df['col1'] == 1]['col2'].values,
    'col2.2': df[df['col1'] == 2]['col2'].values,
    'col3.1': df[df['col1'] == 1]['col3'].values,
    'col3.2': df[df['col1'] == 2]['col3'].values
})
new_df

Is there a better way to do this?

Thanks a lot.

PD: The techniques shown on this post could be adapted to the task, but aggregation functions should be avoided.

0

There are 0 best solutions below