How can I reshape my df and split one categorical column into several?

223 Views Asked by At

I have seperate csv data for each weather station, each climate scenario, each RCP and each variable. After appending the data an example of it looks like this:

Example:

import pandas as pd
df_before = pd.DataFrame({'canton': ['AG', 'AG', 'AG', 'AG', 'AG', 'AG', 'AG', 'AG', 'AG', 'BE', 'BE', 'BE'], 
                          'time': ['1981-01-01', '1981-01-01', '1981-01-01', '1981-01-01', '1981-01-01', 
                                   '1981-01-01', '1981-01-01', '1981-01-01', '1981-01-02', '1981-01-02', '1981-01-02', '1981-01-02'],
                           'value': [3, 5, 1, 2, 4, 5, 6, 7, 2, 1, 2, 3],
                          'stn': ['BUS', 'BUS', 'BUS', 'BUS', 'BUS', 'BUS', 'BUS', 'BUS', 'BUS', 'BER', 'BER', 'BER'],
                          'RCPs': ['RCP26', 'RCP26', 'RCP26', 'RCP45', 'RCP45', 'RCP45', 'RCP85', 'RCP85', 'RCP85', 'RCP26', 'RCP26', 'RCP26'],
                          'var': ['tas', 'tasmin','tasmax', 'tas', 'tasmin', 'tasmax', 'tas',  'tasmin',  'tasmax', 'tas', 'tasmin', 'tasmax', ]
                         })

df_before

However, I would want to split the value column on to the var (tas, tasmin, tasmax, pr) column, without loosing information from the RCPs and the station:

df_after = pd.DataFrame({'canton': ['AG', 'AG', 'AG', 'BE'], 
                          'time': ['1981-01-01', '1981-01-01', '1981-01-01', '1981-01-02'],
                          'stn': ['BUS', 'BUS', 'BUS', 'BER'],
                          'RCPs': ['RCP26', 'RCP45', 'RCP85', 'RCP26'],
                          'tas': ['3', '2', '6', '1'],
                          'tasmin': ['5', '4', '7', '2'],
                          'tasmax': ['1', '5', '2', '3']
                         })

Such that it looks like this: df_after

1

There are 1 best solutions below

0
On

Like I mentioned in the comments, you can use the pivot() method:

df_after = df_before.pivot(
    index=['canton','time','stn','RCPs'], columns='var', values='value'
).rename_axis(None, axis=1).reset_index()

print(df_after)
    canton        time  stn  RCPs   tas tasmax  tasmin
0       AG  1981-01-01  BUS RCP26     3      1       5
1       AG  1981-01-01  BUS RCP45     2      5       4
2       AG  1981-01-01  BUS RCP85     6      2       7
3       BE  1981-01-02  BER RCP26     1      3       2

The rename_axis() method removes the name of the column index after pivoting and reset_index() converts the multi-index created by pivot to dataframe columns.