I want to solve this combinatorial optimization task using Hungarian algorithm:
Different teams of the same company may have employees in different cities. If employees have the same position, we can swap them. And the goal is to distribute employees to teams in such a way as to maximize the number of teams that have employees which are together in the same city. Data we have consists of 4 columns: employee's id, employee's team, employee's position, employee's city. The goal is to redistribute employees across teams and get a table with 4 columns: employee's id, employee's team, employee's position, employee's city
Example: input:
employee_id team_id position city
1 0.0 Manager New York
2 0.0 Manager San Francisco
3 1.0 Engineer Boston
4 1.0 Engineer Boston
5 2.0 Engineer London
6 2.0 Engineer London
7 2.0 Manager New York
output:
employee_id team_id position city
1 0.0 Manager New York
7 0.0 Manager New York
3 1.0 Engineer Boston
4 1.0 Engineer Boston
5 2.0 Engineer London
6 2.0 Engineer London
2 2.0 Manager San Francisco
as you see employees with id 7 and 2 were swapped and now all employees of team 0.0 are in New York.
i wrote this code:
import numpy as np
import scipy.optimize as opt
def hungarian(cost_matrix):
row_ind, col_ind = opt.linear_sum_assignment(cost_matrix)
return row_ind, col_ind
def redistribute_employees(employee_data, cost_matrix):
n = len(employee_data)
row_ind, col_ind = hungarian(cost_matrix)
new_teams = np.zeros(n)
for i in range(n):
new_teams[i] = col_ind[int(employee_data[i, 1])]
return np.column_stack((employee_data[:, 0], new_teams, employee_data[:, 2], employee_data[:, 3]))
employee_data = np.array([[1, 0, 'Manager', 'New York'],
[2, 0, 'Manager', 'San Francisco'],
[3, 1, 'Engineer', 'Boston'],
[4, 1, 'Engineer', 'Boston'],
[5, 2, 'Engineer', 'London'],
[6, 2, 'Engineer', 'London'],
[7, 2, 'Manager', 'New York']])
city_map = {'New York': 0, 'San Francisco': 1, 'Boston': 2, 'London': 3}
n = len(employee_data)
cost_matrix = np.zeros((n, n))
for i in range(n):
for j in range(n):
if employee_data[i, 2] == employee_data[j, 2] and employee_data[i, 3] != employee_data[j, 3]:
cost_matrix[i, j] = 1
redistributed_employees = redistribute_employees(employee_data, cost_matrix)
print(redistributed_employees)
The condition employee_data[i, 2] == employee_data[j, 2] and employee_data[i, 3] != employee_data[j, 3] checks if two employees have the same position (employee_data[i, 2] == employee_data[j, 2]) but work in different cities (employee_data[i, 3] != employee_data[j, 3]). In other words, it checks if two employees can be swapped without affecting the position but making sure that they are not working in the same city.
print of cost_matrix is: array([[0., 1., 0., 0., 0., 0., 0.], [1., 0., 0., 0., 0., 0., 1.], [0., 0., 0., 0., 1., 1., 0.], [0., 0., 0., 0., 1., 1., 0.], [0., 0., 1., 1., 0., 0., 0.], [0., 0., 1., 1., 0., 0., 0.], [0., 1., 0., 0., 0., 0., 0.]])
But the output is:
[['1' '0.0' 'Manager' 'New York']
['2' '0.0' 'Manager' 'San Francisco']
['3' '1.0' 'Engineer' 'Boston']
['4' '1.0' 'Engineer' 'Boston']
['5' '2.0' 'Engineer' 'London']
['6' '2.0' 'Engineer' 'London']
['7' '2.0' 'Manager' 'New York']]
nothing changed and I don't understand why. the steps of algorithm I use are:
Create a cost matrix where the entries represent the cost of assigning an employee to a team. In this case, the cost could be defined as 1 if the employees are in different cities, and 0 if they are in the same city.
Run the Hungarian algorithm on the cost matrix to find a minimum-cost matching between employees and teams. The algorithm will find the minimum number of swaps needed to achieve a maximum number of employees in the same city.
Based on the minimum-cost matching, reassign employees to teams to maximize the number of employees in the same city.
The final output is a table with the reassigned employee ids, teams, positions, and cities
What am I doing wrong? How to fix my code?