python pandas create new string column cumulatively based on other column condition

64 Views Asked by At

Suppose I have a dataset (df)

Group | Employee_Title | Employee_Name  
A     | Manager        | John     
A     | Analyst        | Adam     
A     | Analyst        | Smith    
B     | Manager        | Bill    
B     | Analyst        | Ed    
B     | Analyst        | Jay 

I want to create a new column "Group_Manager" so that the new dataset would be:

Group | Employee_Title | Employee_Name | Group_Manager 
A     | Manager        | John          | John
A     | Analyst        | Adam          | John           
A     | Analyst        | Smith         | John    
B     | Manager        | Bill          | Bill    
B     | Analyst        | Ed            | Bill       
B     | Analyst        | Jay           | Bill  

I am looking for python code that can do this in some "cumulative" way, like (not working right now) :

df['Group_Manager']=df.groupby('Group').apply(lambda Employee_Title,Employee_Name: Employee_Name if Employee_Title=="Manager" else keep previous Group_Manager)
1

There are 1 best solutions below

4
user3046211 On BEST ANSWER

By retrieving the manager's name for each group and then reindex it according to the main dataframe 'Group' column, you can achieve the results that you desired

import pandas as pd

# Sample data
data = {
    'Group': ['A', 'A', 'A', 'B', 'B', 'B'],
    'Employee_Title': ['Manager', 'Analyst', 'Analyst', 'Manager', 'Analyst', 'Analyst'],
    'Employee_Name': ['John', 'Adam', 'Smith', 'Bill', 'Ed', 'Jay']
}
df = pd.DataFrame(data)

# Create the Group_Manager column
df['Group_Manager'] = df.groupby('Group').apply(lambda g: g['Employee_Name'][g['Employee_Title'] == 'Manager'].iloc[0]).reindex(df['Group']).reset_index(drop=True)

print(df)

which results in

  Group Employee_Title Employee_Name Group_Manager
0     A        Manager          John          John
1     A        Analyst          Adam          John
2     A        Analyst         Smith          John
3     B        Manager          Bill          Bill
4     B        Analyst            Ed          Bill
5     B        Analyst           Jay          Bill

Another approach using itertools.accumulate

import pandas as pd
import itertools

# Sample data
data = {
    'Group': ['A', 'A', 'A', 'B', 'B', 'B'],
    'Employee_Title': ['Manager', 'Analyst', 'Analyst', 'Manager', 'Analyst', 'Analyst'],
    'Employee_Name': ['John', 'Adam', 'Smith', 'Bill', 'Ed', 'Jay']
}
df = pd.DataFrame(data)

# Create a series with manager names where title is 'Manager', and NaN elsewhere
managers = df['Employee_Name'].where(df['Employee_Title'] == 'Manager', None)

# Use accumulate to carry forward the manager names
df['Group_Manager'] = list(itertools.accumulate(managers, lambda x, y: x if y is None else y))

print(df)

which results in

  Group Employee_Title Employee_Name Group_Manager
0     A        Manager          John          John
1     A        Analyst          Adam          John
2     A        Analyst         Smith          John
3     B        Manager          Bill          Bill
4     B        Analyst            Ed          Bill
5     B        Analyst           Jay          Bill