Suppose we have a numpy array A of size M x N which we interpret as M vectors of dimension N. For three vectors a,b,c we'd like to compute the cosine of the angle they form:
cos(angle(a,b,c)) = np.dot((a-b)/norm(a-b), (c-b)/norm(c-b))
We want to compute this quantity over triplets of A, of which there should be (M choose 2)*(M-2) unique triplets (by symmetry of a and c; please correct me if I'm wrong on this). We can of course accomplish this with a triply nested for loop, but I am hoping this can be done in a vectorized manner. I am pretty sure I can use some broadcasting tricks to compute an array that includes the desired outputs and more, but I am hoping someone can furnish a recipe that computes exactly the unique quantities, preferably without extra computation. Thanks.
edit. for completeness, naive impl. with loops:
angles = []
for i in range(len(A)):
for j in range(len(A)):
for k in range(i+1, len(A)):
if j not in (i,k):
d1 = A[i] - A[j]
d2 = A[k] - A[j]
ang = np.dot(d1/np.linalg.norm(d1), d2/np.linalg.norm(d2))
angles.append(ang)
I think that's right. I counted
M * ((M-1) choose 2), and that's equivalent.Well, let's start with the easy thing - vectorizing your loop, assuming we have arrays of indices
i,j, andkpre-generated.This reduces the problem to computing the arrays of indices, assuming that indexing the arrays is a small enough fraction of the total computation time. I can't see a way to avoid the redundant calculations without doing this sort of thing.
Then, if we're willing to allow redundant calculation, the simplest way to generate the indices is with
np.meshgrid.For
Aof shape (30, 3) on Colab, the Python loop method took 160ms, and this solution took 7ms.It's very easy to generate the unique sets of indices quickly if we're allowed to use Numba. This is basically just your code broken out into a function:
(Of course, we could also just use Numba on your whole loop.)
This took a little less than half the time as the redundant vectorized solution.
It might be possible to generate the indices efficiently using pure NumPy. Initially, I thought it was going to be as simple as:
That's not quite right, but it is possible to start with that and fix those indices with a loop over
range(M)(better than triple nesting, anyway!). Something like:~~I think this can be sped up by using just slices and no masks, but I can't do that tonight.~~
Update: as indicated in the comment, the last loop wasn't necessary!
And that's quite a bit faster than the Numba loop. I'm actually surprised that worked out!
All the code together, in case you want to run it: