I have the following pandas
dataframe df
:
cluster tag amount name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon
What I need to do is add another column in df
that writes for each row
, the name
(from the name column) that has the highest amount
where the tag
is 1. In other words, the solution looks like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely
I've tried something like this:
df.group('clusters')['name','amount'].transform('max')[df['tag']==1]
but the problem with this is that the name does note repeat on every row. It will look like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN
Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?
You can do this as a two-stage process. First calculate a mapping series, then map by cluster:
If you want to use
groupby
, here's one way: