Nearest neighbors with uncertain points

Question

Nearest neighbors with uncertain points

189 Views Asked by floflo29 At 17 August 2025 at 00:31

I have two 2D points sets A and B. I want to find the first nearest neighbor in A for each point in B. However, I am dealing with uncertain points (i.e. a point has a mean (2D vector) and a 2*2 covariance matrix).

I thus would like to use the Mahalanobis distance, but in scikit-learn (for example), I cannot pass a covariance matrix for each point, as it expects a single covariance matrix.

Currently, considering only the average locations (i.e. mean of my 2D normal distribution), I have:

nearest_neighbors = NearestNeighbors(n_neighbors=1, metric='l2').fit(A)
distance, indices = nearest_neighbors.kneighbors(B)

With my uncertain points, instead of using the L2 norm as a distance, I would rather compute (between a point a in A and a point b in B, their Mahalanobis distance:

d(a, b) = sqrt( transpose(mu_a-mu_b) * C * (mu_a-mu_b))

where C = inv(cov_a + cov_b)

where mu_a (resp mu_b) and cov_a (resp. cov_b) are the 2D mean and 2*2 covariance matrix of uncertain point a (resp. b).

Original Q&A

There are 2 best solutions below

jperezmartin On 06 January 2017 at 21:23

You can implement a KNN solution using your own distance function using list comprehension simply. This is an example using the Mahalanobis distance implementation built into the OpenCV library

import numpy as np
import cv2

np_gallery=np.array(gallery)
np_query=np.array(query)

K=12

ids=[]

def insertionsort(comp_list):
    for i in range( 1, len(comp_list)):
    tmp = comp_list[i]
    k = min(i,K)
    while k > 0 and tmp[1] < comp_list[k - 1][1]:
        comp_list[k] = comp_list[k - 1]
        k -= 1
    comp_list[k] = tmp

def search():
    for q in np_query:
        c = [(i,cv2.Mahalanobis(q, x, icovar)) for i, x in enumerate(np_gallery)]
        insertionsort(c)
        ids.append(map(lambda tup: tup[0], c[0:K]))

or

def search():
    for q in np_query:
        c = [(i,cv2.Mahalanobis(q, x, icovar)) for i, x in enumerate(np_gallery)]
        ids.append(map(lambda tup: tup[0], sorted(c, key=lambda tup: tup[1])[0:K]))

In the first case, I use a variant of insertion sort taking into account the parameter K. Which can be more efficient when N >> K

**floflo29** · Accepted Answer

I ended up using a custom distance:

def my_mahalanobis_distance(x, y):
    '''
    x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2, 
                            x[2]: cov_x_11, x[3]: cov_x_22
    y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
                            y[2]: cov_y_11, y[3]: cov_y_22 
    '''     



    return sp.spatial.distance.mahalanobis(x[:2], y[:2], 
                                           np.linalg.inv(np.diag(x[2:]) 
                                           + np.diag(y[2:])))

Thus a point has 4 features:

x and y coordinates
x and y variances (covariance matrix is diagonal in my case)

Nearest neighbors with uncertain points

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in NEAREST-NEIGHBOR

Related Questions in MAHALANOBIS

Trending Questions

Popular # Hahtags

Popular Questions