Nearest neighbors with uncertain points

200 Views Asked by At

I have two 2D points sets A and B. I want to find the first nearest neighbor in A for each point in B. However, I am dealing with uncertain points (i.e. a point has a mean (2D vector) and a 2*2 covariance matrix).

I thus would like to use the Mahalanobis distance, but in scikit-learn (for example), I cannot pass a covariance matrix for each point, as it expects a single covariance matrix.

Currently, considering only the average locations (i.e. mean of my 2D normal distribution), I have:

nearest_neighbors = NearestNeighbors(n_neighbors=1, metric='l2').fit(A)
distance, indices = nearest_neighbors.kneighbors(B)

With my uncertain points, instead of using the L2 norm as a distance, I would rather compute (between a point a in A and a point b in B, their Mahalanobis distance:

d(a, b) = sqrt( transpose(mu_a-mu_b) * C * (mu_a-mu_b))

where C = inv(cov_a + cov_b)

where mu_a (resp mu_b) and cov_a (resp. cov_b) are the 2D mean and 2*2 covariance matrix of uncertain point a (resp. b).

2

There are 2 best solutions below

0
On BEST ANSWER

I ended up using a custom distance:

def my_mahalanobis_distance(x, y):
    '''
    x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2, 
                            x[2]: cov_x_11, x[3]: cov_x_22
    y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
                            y[2]: cov_y_11, y[3]: cov_y_22 
    '''     



    return sp.spatial.distance.mahalanobis(x[:2], y[:2], 
                                           np.linalg.inv(np.diag(x[2:]) 
                                           + np.diag(y[2:])))

Thus a point has 4 features:

  • x and y coordinates
  • x and y variances (covariance matrix is diagonal in my case)
0
On

You can implement a KNN solution using your own distance function using list comprehension simply. This is an example using the Mahalanobis distance implementation built into the OpenCV library

import numpy as np
import cv2

np_gallery=np.array(gallery)
np_query=np.array(query)

K=12

ids=[]

def insertionsort(comp_list):
    for i in range( 1, len(comp_list)):
    tmp = comp_list[i]
    k = min(i,K)
    while k > 0 and tmp[1] < comp_list[k - 1][1]:
        comp_list[k] = comp_list[k - 1]
        k -= 1
    comp_list[k] = tmp

def search():
    for q in np_query:
        c = [(i,cv2.Mahalanobis(q, x, icovar)) for i, x in enumerate(np_gallery)]
        insertionsort(c)
        ids.append(map(lambda tup: tup[0], c[0:K]))

or

def search():
    for q in np_query:
        c = [(i,cv2.Mahalanobis(q, x, icovar)) for i, x in enumerate(np_gallery)]
        ids.append(map(lambda tup: tup[0], sorted(c, key=lambda tup: tup[1])[0:K]))

In the first case, I use a variant of insertion sort taking into account the parameter K. Which can be more efficient when N >> K