KNN - Triangular Inequality Optimization

323 Views Asked by At

I don't fully understand how the triangular inequality is used to optimise distance calculations in KNN classification.

I had written a python script referring the steps mentioned below

  1. Calculate the distance between each training pixel to the other.
  2. For each test sample
    • Calculate the distance from the first training sample as dn. This would be the current minimum distance.
    • Calculate the distance from the second training sample(p) as dp.
    • If dp < dn assign dn =dp
    • For each remaining training sample(c)
      • If distance between the sample c and sample p measured as dcp meets
        dp - dn < dcp < dp + dn
        Calculate distance from test sample to the sample c as dp
        If dp < dn, assign: dn = dp
      • Else, skip this training sample.
  3. Stop if there are no more training samples
  4. The class to which n belongs is the estimate.

Python Script:

def get_distance(p1 = (0, 0), p2 = (0, 0)):
    return abs(p1[0] - p2[0]) + abs(p1[1] - p2[1])


def algorithm(train_set, new_point):
    d_n = get_distance(new_point, train_set[0])
    d_p = get_distance(new_point, train_set[1])
    min_index = 0

    if d_p < d_n:
        d_n = d_p
        min_index = 1

    for c in range(2, len(train_set)):
        dcp = get_distance(train_set[min_index], train_set[c])

        if d_p - d_n < dcp < d_p + d_n:
            d_p = get_distance(new_point, train_set[c])

            if d_p < d_n:
                d_n = d_p
                min_index = c

    print(train_set[min_index], d_n)


train_set = [
    (0, 1, 'A'),
    (1, 1, 'A'),
    (2, 5, 'B'),
    (1, 8, 'A'),
    (5, 3, 'C'),
    (4, 2, 'C'),
    (3, 2, 'A'),
    (1, 7, 'B'),
    (4, 8, 'B'),
    (4, 0, 'A'),
]

for new_point in train_set:
    # Checking the distances from the points within training set iteself: min distance = 0, used for validation
    result_point = min(train_set, key = lambda x : get_distance(x, new_point))
    print(result_point, get_distance(result_point, new_point))

    algorithm(train_set, new_point)
    print('----------')

But it doesn't give the required result for 1 point.

Is my understanding of the optimization wrong?

Thank you in advance for any help.

0

There are 0 best solutions below