Find maximum value of each group within a Pandas Frame

41 Views Asked by At

I do have a question, hoping that you could give me a little support. I looked into the archiv here, found a solution but that's taking much time and is not "beautiful", since works with Loops

Suppose you have a following frame

System    Country_Key    Name    Bank_number_length    Check rule for bank acct no.        

PEM       AD             Andorra     8                          2
PL1       AD             Andorra     15                         5
PPE       AD             Andorra     14                         5 
P11       AD             Andorra     9                          5  
P16       AD             Andorra     12                         4

PEM       AE             Emirates     3                         5 
PL1       AE             Emirates     15                        4
PPE       AE             Emirates     15                        5
P11       AE             Emirates     15                        6
P16       AE             Emirates     13                        5

I found the following approach for two columns Get the max value from each group with pandas.DataFrame.groupby However, in my case I do really have many columns and need to set the index for the first three columns "System", "Country_Key" and "Name"

my desire output would be the following

System    Country_Key    Name    Bank_number_length    Check rule for bank acct no.
PEM       AD           Andorra                               
PL1                                 15                        5
PPE                                                           5 
P11                                                           5  
P16                                                  

PEM       AE           Emirates                               
PL1                                 15                        
PPE                                 15                        
P11                                 15                        6
P16                                                 

So actually dropping the lowest values except the max value. Any kind of hint would be really benefical

1

There are 1 best solutions below

2
Ynjxsjmh On

You can try mask the not max value to empty string and mask the duplicated values to empty string

keys = ['Country_Key', 'Name']
cols = ['Bank_number_length',  'Check rule for bank acct no.']
df[cols] = df[cols].mask(df[cols].ne(df.groupby(keys)[cols].transform(max)), '')
df.loc[df.duplicated(keys), keys] = ''
print(df)

  System Country_Key      Name Bank_number_length Check rule for bank acct no.
0    PEM          AD   Andorra
1    PL1                                       15                            5
2    PPE                                                                     5
3    P11                                                                     5
4    P16
5    PEM          AE  Emirates
6    PL1                                       15
7    PPE                                       15
8    P11                                       15                            6
9    P16