python pandas create new string column cumulatively based on other column condition

Question

python pandas create new string column cumulatively based on other column condition

64 Views Asked by datapy At 14 October 2023 at 05:24

Suppose I have a dataset (df)

Group | Employee_Title | Employee_Name  
A     | Manager        | John     
A     | Analyst        | Adam     
A     | Analyst        | Smith    
B     | Manager        | Bill    
B     | Analyst        | Ed    
B     | Analyst        | Jay

I want to create a new column "Group_Manager" so that the new dataset would be:

Group | Employee_Title | Employee_Name | Group_Manager 
A     | Manager        | John          | John
A     | Analyst        | Adam          | John           
A     | Analyst        | Smith         | John    
B     | Manager        | Bill          | Bill    
B     | Analyst        | Ed            | Bill       
B     | Analyst        | Jay           | Bill

I am looking for python code that can do this in some "cumulative" way, like (not working right now) :

df['Group_Manager']=df.groupby('Group').apply(lambda Employee_Title,Employee_Name: Employee_Name if Employee_Title=="Manager" else keep previous Group_Manager)

Original Q&A

There are 1 best solutions below

**user3046211** · Accepted Answer · 2023-10-14T05:59:13.150000

By retrieving the manager's name for each group and then reindex it according to the main dataframe 'Group' column, you can achieve the results that you desired

import pandas as pd

# Sample data
data = {
    'Group': ['A', 'A', 'A', 'B', 'B', 'B'],
    'Employee_Title': ['Manager', 'Analyst', 'Analyst', 'Manager', 'Analyst', 'Analyst'],
    'Employee_Name': ['John', 'Adam', 'Smith', 'Bill', 'Ed', 'Jay']
}
df = pd.DataFrame(data)

# Create the Group_Manager column
df['Group_Manager'] = df.groupby('Group').apply(lambda g: g['Employee_Name'][g['Employee_Title'] == 'Manager'].iloc[0]).reindex(df['Group']).reset_index(drop=True)

print(df)

which results in

  Group Employee_Title Employee_Name Group_Manager
0     A        Manager          John          John
1     A        Analyst          Adam          John
2     A        Analyst         Smith          John
3     B        Manager          Bill          Bill
4     B        Analyst            Ed          Bill
5     B        Analyst           Jay          Bill

Another approach using itertools.accumulate

import pandas as pd
import itertools

# Sample data
data = {
    'Group': ['A', 'A', 'A', 'B', 'B', 'B'],
    'Employee_Title': ['Manager', 'Analyst', 'Analyst', 'Manager', 'Analyst', 'Analyst'],
    'Employee_Name': ['John', 'Adam', 'Smith', 'Bill', 'Ed', 'Jay']
}
df = pd.DataFrame(data)

# Create a series with manager names where title is 'Manager', and NaN elsewhere
managers = df['Employee_Name'].where(df['Employee_Title'] == 'Manager', None)

# Use accumulate to carry forward the manager names
df['Group_Manager'] = list(itertools.accumulate(managers, lambda x, y: x if y is None else y))

print(df)

which results in

  Group Employee_Title Employee_Name Group_Manager
0     A        Manager          John          John
1     A        Analyst          Adam          John
2     A        Analyst         Smith          John
3     B        Manager          Bill          Bill
4     B        Analyst            Ed          Bill
5     B        Analyst           Jay          Bill

python pandas create new string column cumulatively based on other column condition

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in CUMSUM

Related Questions in ACCUMULATE

Trending Questions

Popular # Hahtags

Popular Questions