Making certain matches impossible in assignment problem (Hungarian algorithm)

216 Views Asked by At

Motivation:

I'm using scipy's python implementation of hungarian algorithm (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html#scipy-optimize-linear-sum-assignment) for matching two sets of time events. So far so good, but I have a time difference limit (let's say 0.1s) for matching these events and I don't want any matches which would have time difference above this limit.

An example without any time difference limit:

from dataclasses import dataclass
from scipy.optimize import linear_sum_assignment

@dataclass(frozen=True)
class Event:
    t: float

expected = [Event(0), Event(1), Event(2)]
detected = [          Event(1), Event(2), Event(3)] # shifted to show
                                                    # time matching

def tdiff(e1: Event, e2: Event) -> float:
    return abs(e1.t - e2.t)

cost_matrix = [
    [
        tdiff(e1, e2)
        for e2 in detected
    ]
    for e1 in expected
]
print('cost matrix:')
for row in cost_matrix:
    print('[', ', '.join(map(str, row)), ']')

row_idx, col_idx = linear_sum_assignment(cost_matrix)
for i, j in zip(row_idx, col_idx):
    exp_ev = expected[i]
    det_ev = detected[j]
    td = cost_matrix[i][j]
    print('matches %s and %s, tdiff: %.3f' % (exp_ev, det_ev, td))

This prints:

cost matrix:
[ 1, 2, 3 ]
[ 0, 1, 2 ]
[ 1, 0, 1 ]
matches Event(t=0) and Event(t=1), tdiff: 1.000
matches Event(t=1) and Event(t=2), tdiff: 1.000
matches Event(t=2) and Event(t=3), tdiff: 1.000

This is clearly not what I want, because all matches are above the limit (0.1s).

What have I tried:

My solution is to artificially increase cost of "unwanted" matching in the cost matrix, and then I filter matches to return only "wanted" matches:

from dataclasses import dataclass
from scipy.optimize import linear_sum_assignment

@dataclass(frozen=True)
class Event:
    t: float

TDIFF_MAX = 0.1
expected = [Event(0), Event(1), Event(2)]
detected = [          Event(1), Event(2), Event(3)] # shifted to show
                                                    # time matching

def tdiff(e1: Event, e2: Event) -> float:
    return abs(e1.t - e2.t)

orig_cost_matrix = [
    [
        tdiff(e1, e2)
        for e2 in detected
    ]
    for e1 in expected
]
orig_max = max(map(max, orig_cost_matrix))

print('original cost matrix:')
for row in orig_cost_matrix:
    print('[', ', '.join(map(str, row)), ']')

cost_matrix = [
    [
        x if x <= TDIFF_MAX else orig_max + 1
        for x in row
    ]
    for row in orig_cost_matrix
]
print('modified cost matrix:')
for row in cost_matrix:
    print('[', ', '.join(map(str, row)), ']')

row_idx, col_idx = linear_sum_assignment(cost_matrix)
for i, j in zip(row_idx, col_idx):
    exp_ev = expected[i]
    det_ev = detected[j]
    td = cost_matrix[i][j]
    if td <= TDIFF_MAX:
        print('matches %s and %s, tdiff: %.3f' % (exp_ev, det_ev, td))

Output:

original cost matrix:
[ 1, 2, 3 ]
[ 0, 1, 2 ]
[ 1, 0, 1 ]
modified cost matrix:
[ 4, 4, 4 ]
[ 0, 4, 4 ]
[ 4, 0, 4 ]
matches Event(t=1) and Event(t=1), tdiff: 0.000
matches Event(t=2) and Event(t=2), tdiff: 0.000

This works fine, but I don't like this solution for 2 reasons:

  1. it looks ugly and is harder to understand
  2. in "real" application the expected and detected sets can be quite large, and since time difference limit is rather small, most of the matches are "unwanted", which makes me think this is rather inefficient and wastes a lot of CPU.

Question:

Is there abetter way to "disable" some possible matches (not necessary using hungarian algoritm..), which would work better in cases with large data sets where each Event has mostly only 1, 2 or 3 possible matches?

I'm mostly interested in algorithms, so feel free to post solutions in other languages than python or even in pseudocode.

0

There are 0 best solutions below