Cartesian (cross) products and np.unique()

142 Views Asked by At

Depending on your few on my approach this is either a question about using np.unique() on awkward1 arrays or a call for a better approach:

Let a and b be two awkward1 arrays of the same outer length (number of events) but different inner lengths. For example:

a = [[1, 2], [3] , [] , [4, 5, 6]]

b = [[7] , [3, 5], [6], [8, 9]]

Let f: (x, y) -> z be a function that acts on two numbers x and y and results in the number z. For example:

f(x, y):= y - x

The idea is to compare every element in a with every element in b via f for each event and filter out the matches of a and b pairs that survive some cut applied to f. For example:

f(x, y) < 4

My approach for this is:

a = ak.from_iter(a)
b = ak.from_iter(b)

c = ak.cartesian({'x':a, 'y':b})
#c= [[{'x': 1, 'y': 7}, {'x': 2, 'y': 7}], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 4, 'y': 8}, {'x': 4, 'y': 9}, {'x': 5, 'y': 8}, {'x': 5, 'y': 9}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]

i = ak.argcartesian({'x':a, 'y':b})
#i= [[{'x': 0, 'y': 0}, {'x': 1, 'y': 0}], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}], [], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}, {'x': 1, 'y': 0}, {'x': 1, 'y': 1}, {'x': 2, 'y': 0}, {'x': 2, 'y': 1}]]

diff = c['y'] - c['x']
#diff= [[6, 5], [0, 2], [], [4, 5, 3, 4, 2, 3]]

cut = diff < 4
#cut= [[False, False], [True, True], [], [False, False, True, False, True, True]]

new = c[cut]
#new= [[], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 5, 'y': 8}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]

new_i = i[cut]
#new_i= [[], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}], [], [{'x': 1, 'y': 0}, {'x': 2, 'y': 0}, {'x': 2, 'y': 1}]]

It is possible that pairs with the same element from a but different elements from b survive the cut. (e.g. {'x': 3, 'y': 3} and {'x': 3, 'y': 5})

My goal is to group those pairs with the same element from a together and therefore reshape the new array into:

new = [[], [{'x': 3, 'y': [3, 5]}], [], [{'x': 5, 'y': 8}, {'x': 6, 'y': [8, 9]}]]

My only idea how to achieve this is to create a list of the indexes from a that are still present after the cut by using new_i:

i = new_i['x']
#i= [[], [0, 0], [], [1, 2, 2]]

However, I need a unique version of this list to make every index appear only once. This could be achieved with np.unique() in NumPy. But doesn't work in awkward1:

np.unique(i)

<__array_function__ internals> in unique(*args, **kwargs)

TypeError: no implementation found for 'numpy.unique' on types that implement __array_function__: [<class 'awkward1.highlevel.Array'>]

My question:

Is their a np.unique() equivalent in awkward1 and/or would you recommend a different approach to my problem?

1

There are 1 best solutions below

0
On

Okay, I still don't know how to use np.unique() on my arrays, but I found a solution for my own problem:

In my previous approach I used the following code to pair up booth arrays.

c = ak.cartesian({'x':a, 'y':b})
#c= [[{'x': 1, 'y': 7}, {'x': 2, 'y': 7}], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 4, 'y': 8}, {'x': 4, 'y': 9}, {'x': 5, 'y': 8}, {'x': 5, 'y': 9}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]

However, with the nested = True parameter from ak.cartesian() I get a list grouped by the elements of a:

c = ak.cartesian({'x':a, 'y':b}, axis = 1, nested = True)
#c= [[[{'x': 1, 'y': 7}], [{'x': 2, 'y': 7}]], [[{'x': 3, 'y': 3}, {'x': 3, 'y': 5}]], [], [[{'x': 4, 'y': 8}, {'x': 4, 'y': 9}], [{'x': 5, 'y': 8}, {'x': 5, 'y': 9}], [{'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]]

After the cut I end up with:

new = c[cut]
#new= [[[], []], [[{'x': 3, 'y': 3}, {'x': 3, 'y': 5}]], [], [[], [{'x': 5, 'y': 8}], [{'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]]

I extract the y values and reduce the most inner layer of the nested lists of new to only one element:

y = new['y']
#y= [[[], []], [[3, 5]], [], [[], [8], [8, 9]]]

new = ak.firsts(new, axis = 2)
#new= [[None, None], [{'x': 3, 'y': 3}], [], [None, {'x': 5, 'y': 8}, {'x': 6, 'y': 8}]]

(I tried to use ak.firsts() with axis = -1 but it seems to be not implemented yet.)


Now every most inner entry in new belongs to exactly one element from a. By replacing the current y of new with the previously extracted y I end up with my desired result:

new['y'] = y
#new= [[None, None], [{'x': 3, 'y': [3, 5]}], [], [None, {'x': 5, 'y': [8]}, {'x': 6, 'y': [8, 9]}]]

Anyway, should you know a better solution, I'd be pleased to hear it.