I have an array:
[[ 0.32730174 -0.1436172 -0.3355202 -0.2982458 ]
[ 0.50490916 -0.33826587 0.4315952 0.4850834 ]
[-0.18594801 -0.06028342 -0.24817085 -0.41029227]
[-0.22551994 0.47151482 -0.39798814 -0.14978702]
[-0.3315491 0.05832376 -0.29526958 0.3786153 ]]
I have calculated its cosine distance with "pdist", cosine_distance=1-pdist(array, metric='cosine')
and got the distance array:
[-0.14822659 0.51635946 0.09485546 -0.38855427 -0.82434624 -0.86407176
-0.25101774 0.49793639 -0.07881047 0.41272145]
Now, I want to get only those pairs which's cosine distance is greater than 0.4 and less than 0.49. I have figured out the number of values which is greater than 0.4, by number_points=len([1 for i in cosine_distance if i >= 0.4])
. But not able to get those pairs.
The trick is in the description of the output for pdist.
The documentation also refers to squareform to make the distance vector a matrix again. The documentation explanation of the output array makes sense then. The
ij
position in the documentation will be the first and second index of the matrix created by thesquareform
operation. We can then get every distance regarding every point pair.