I have the following pandas dataframe df:
cluster tag amount name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon
What I need to do is add another column in df that writes for each row, the name (from the name column) that has the highest amount where the tag is 1. In other words, the solution looks like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely
I've tried something like this:
df.group('clusters')['name','amount'].transform('max')[df['tag']==1]
but the problem with this is that the name does note repeat on every row. It will look like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN
Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?
You can do this as a two-stage process. First calculate a mapping series, then map by cluster:
If you want to use
groupby, here's one way: