how to pivot only specific columns in a python dataframe

47 Views Asked by At

I am trying to create a process where I upload certain editable pdf data parse through the editable pdfs and then arrange the data in a df. I have managed to read the data in the editable pdfs and get them into a df, but I have run into an issue where the names on the individuals are appearing across the df as columns rather than individual row for each player, currently the rows for each player are appearing as name_#, position_#, country_# ...

This is not the actual data I am using just recreating the scenario I am facing.

df = pd.DataFrame({
    'Team': ["Bayern", "Barcelona", "Madrid"],
    'region': ["Bravaria", "Barcelona", "Madrid"],
    'title': ["Bundesliga", "Laliga", "Champions Leauge"],
    'name_1': ["Robben", "Messi", "Ronaldo"],
    'Position_1': ["RW", "ST", "ST"],
    'Country_1': ["Netherlands", "Argentina", "Portugal"],
    'name_2': ["Ribery", "Neymar", "Benzema"],
    'Position_2': ["LW", "LW", "RW"],
    'Country_2': ["FRANCE", "Brazil", "France"]})
Team region title name_1 Position_1 Country_1 name_2 Position_2 Country_2
0 Bayern Bravaria Bundesliga Robben RW Netherlands Ribery LW FRANCE
1 Barcelona Barcelona Laliga Messi ST Argentina Neymar LW Brazil
2 Madrid Madrid Champions Leauge Ronaldo ST Portugal Benzema RW France

I am trying to find a way to reposition pivot the DF so that it can look like the df shared below: Each Team is a different pdf, but the structure is all the same.

df1 = pd.DataFrame({
    'Team': ["Bayern", "Barcelona", "Madrid", "Barcelona", "Madrid", "Bayern"],
    'region': ["Bravaria", "Barcelona", "Madrid", "Barcelona", "Madrid", "Bravaria"],
    'title': ["Bundesliga", "Laliga", "Champions Leauge", "Laliga", "Champions Leauge", "Bundesliga"],
    'name': ["Robben", "Messi", "Ronaldo", "Neymar", "Benzema", "Ribery"],
    'Position': ["RW", "ST", "ST", "ST", "RW", "LW"],
    'Country': ["Netherlands", "Argentina", "Portugal", "Brazil", "France", "France"]})
Team region title name Position Country
0 Bayern Bravaria Bundesliga Robben RW Netherlands
1 Barcelona Barcelona Laliga Messi ST Argentina
2 Madrid Madrid Champions Leauge Ronaldo ST Portugal
3 Barcelona Barcelona Laliga Neymar ST Brazil
4 Madrid Madrid Champions Leauge Benzema RW France
5 Bayern Bravaria Bundesliga Ribery LW France
1

There are 1 best solutions below

0
wjandrea On

This is exactly what wide_to_long() is for.

pd.wide_to_long(
    df,
    ['name', 'Position', 'Country'],
    i='Team',
    j='n',
    sep='_',
)
                region             title     name Position      Country
Team      n
Bayern    1   Bravaria        Bundesliga   Robben       RW  Netherlands
Barcelona 1  Barcelona            Laliga    Messi       ST    Argentina
Madrid    1     Madrid  Champions Leauge  Ronaldo       ST     Portugal
Bayern    2   Bravaria        Bundesliga   Ribery       LW       FRANCE
Barcelona 2  Barcelona            Laliga   Neymar       LW       Brazil
Madrid    2     Madrid  Champions Leauge  Benzema       RW       France