Drop rows after maximum value in a grouped Pandas dataframe

Question

Drop rows after maximum value in a grouped Pandas dataframe

2k Views Asked by Chris B At 06 July 2018 at 16:09

I've got a date-ordered dataframe that can be grouped. What I am attempting to do is groupby a variable (Person), determine the maximum (weight) for each group (person), and then drop all rows that come after (date) the maximum.

Here's an example of the data:

df = pd.DataFrame({'Person': 1,1,1,1,1,2,2,2,2,2],'Date': '1/1/2015','2/1/2015','3/1/2015','4/1/2015','5/1/2015','6/1/2011','7/1/2011','8/1/2011','9/1/2011','10/1/2011'], 'MonthNo':[1,2,3,4,5,1,2,3,4,5], 'Weight':[100,110,115,112,108,205,210,211,215,206]})

    Date      MonthNo  Person  Weight
0   1/1/2015        1       1     100
1   2/1/2015        2       1     110
2   3/1/2015        3       1     115
3   4/1/2015        4       1     112
4   5/1/2015        5       1     108
5   6/1/2011        1       2     205
6   7/1/2011        2       2     210
7   8/1/2011        3       2     211
8   9/1/2011        4       2     215
9  10/1/2011        5       2     206

Here's what I want the result to look like:

    Date      MonthNo  Person  Weight
0   1/1/2015        1       1     100
1   2/1/2015        2       1     110
2   3/1/2015        3       1     115
5   6/1/2011        1       2     205
6   7/1/2011        2       2     210
7   8/1/2011        3       2     211
8   9/1/2011        4       2     215

I think its worth noting, there can be disjoint start dates and the maximum may appear at different times.

My idea was to find the maximum for each group, obtain the MonthNo the maximum was in for that group, and then discard any rows with MonthNo greater Max Weight MonthNo. So far I've been able to obtain the max by group, but cannot get past doing a comparison based on that.

Please let me know if I can edit/provide more information, haven't posted many questions here! Thanks for the help, sorry if my formatting/question isn't clear.

Original Q&A

There are 2 best solutions below

jpp On 06 July 2018 at 16:20

You can use groupby.transform with idxmax. The first 2 steps may not be necessary depending on how your dataframe is structured.

# convert Date to datetime
df['Date'] = pd.to_datetime(df['Date'])

# sort by Person and Date to make index usable for next step
df = df.sort_values(['Person', 'Date']).reset_index(drop=True)

# filter for index less than idxmax transformed by group
df = df[df.index <= df.groupby('Person')['Weight'].transform('idxmax')]

print(df)

        Date  MonthNo  Person  Weight
0 2015-01-01        1       1     100
1 2015-02-01        2       1     110
2 2015-03-01        3       1     115
5 2011-06-01        1       2     205
6 2011-07-01        2       2     210
7 2011-08-01        3       2     211
8 2011-09-01        4       2     215

**BENY** · Accepted Answer · 2018-07-06T16:12:31.503000

Using idxmax with groupby

df.groupby('Person',sort=False).apply(lambda x  : x.reset_index(drop=True).iloc[:x.reset_index(drop=True).Weight.idxmax()+1,:])

Out[131]: 
              Date  MonthNo  Person  Weight
Person                                     
1      0  1/1/2015        1       1     100
       1  2/1/2015        2       1     110
       2  3/1/2015        3       1     115
2      0  6/1/2011        1       2     205
       1  7/1/2011        2       2     210
       2  8/1/2011        3       2     211
       3  9/1/2011        4       2     215

Drop rows after maximum value in a grouped Pandas dataframe

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in PANDAS-GROUPBY

Trending Questions

Popular # Hahtags

Popular Questions