I am trying to get 2D coordinates between 0 and 1 from a distance matrix, following the methods that are specified in the following posts (Multi-Dimensional Scaling):
However, when trying to implement it I get an error and I struggle to find the source of this error. This is what I am doing:
This is the distance matrix (as you can see, the maximum distance is 1, so all the points can be between 0 and 1):
import numpy as np
import math
distance_matrix = np.array(
[
[0.0, 0.47659458, 0.22173311, 0.46660708, 0.78423276],
[0.47659458, 0.0, 0.69805139, 0.01200111, 0.6629441],
[0.22173311, 0.69805139, 0.0, 0.68249177, 1.0],
[0.46660708, 0.01200111, 0.68249177, 0.0, 0.6850815],
[0.78423276, 0.6629441, 1.0, 0.6850815, 0.0],
]
)
Here is where I do the Multi-Dimensional Scaling:
def x_coord_of_point(D, j):
return ( D[0,j]**2 + D[0,1]**2 - D[1,j]**2 ) / ( 2*D[0,1] )
def coords_of_point(D, j):
x = x_coord_of_point(D, j)
return np.array([x, math.sqrt( D[0,j]**2 - x**2 )])
def calculate_positions(D):
(m, n) = D.shape
P = np.zeros( (n, 2) )
tr = ( min(min(D[2,0:2]), min(D[2,3:n])) / 2)**2
P[1,0] = D[0,1]
P[2,:] = coords_of_point(D, 2)
for j in range(3,n):
P[j,:] = coords_of_point(D, j)
if abs( np.dot(P[j,:] - P[2,:], P[j,:] - P[2,:]) - D[2,j]**2 ) > tr:
P[j,1] = - P[j,1]
return P
P = calculate_positions(distance_matrix)
print(P)
Output: [[ 0. 0. ]
[ 0.47659458 0. ]
[-0.22132834 0.01339166]
[ 0.46656063 0.0065838 ]
[ 0.42244347 -0.66072879]]
I do this to make all the points between 0 and 1:
P = P-P.min(axis=0)
Once I have the set of points that are supposed to satisfy the distance matrix, I compute the distance matrix from the points to see if it is equal to the original one:
def compute_dist_matrix(P):
dist_matrix = []
for i in range(len(P)):
lis_pos = []
for j in range(len(P)):
dist = np.linalg.norm(P[i]-P[j])
lis_pos.append(dist)
dist_matrix.append(lis_pos)
return np.array(dist_matrix)
compute_dist_matrix(P)
Output: array([[0. , 0.47659458, 0.22173311, 0.46660708, 0.78423276],
[0.47659458, 0. , 0.69805139, 0.01200111, 0.6629441 ],
[0.22173311, 0.69805139, 0. , 0.68792266, 0.93213762],
[0.46660708, 0.01200111, 0.68792266, 0. , 0.66876934],
[0.78423276, 0.6629441 , 0.93213762, 0.66876934, 0. ]])
As you can see, if we compare this array with the original distance matrix at the beginning of the post, there is no error in the first terms of the matrix, but as we get closer to the end, the error gets bigger and bigger. If the distance matrix is bigger than the one I use in this example, the erros then become huge.
I do not know if the source of error is in the functions that compute P or maybe the function "compute_dist_matrix" is the problem.
Can you spot the source of error? Or maybe, is there an easier way to compute all of this? Maybe there are some functions in some library that already perform this transformation.