Depending on your few on my approach this is either a question about using np.unique() on awkward1 arrays or a call for a better approach:
Let a and b be two awkward1 arrays of the same outer length (number of events) but different inner lengths. For example:
a = [[1, 2], [3] , [] , [4, 5, 6]]
b = [[7] , [3, 5], [6], [8, 9]]
Let f: (x, y) -> z be a function that acts on two numbers x and y and results in the number z. For example:
f(x, y):= y - x
The idea is to compare every element in a with every element in b via f for each event and filter out the matches of a and b pairs that survive some cut applied to f. For example:
f(x, y) < 4
My approach for this is:
a = ak.from_iter(a)
b = ak.from_iter(b)
c = ak.cartesian({'x':a, 'y':b})
#c= [[{'x': 1, 'y': 7}, {'x': 2, 'y': 7}], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 4, 'y': 8}, {'x': 4, 'y': 9}, {'x': 5, 'y': 8}, {'x': 5, 'y': 9}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]
i = ak.argcartesian({'x':a, 'y':b})
#i= [[{'x': 0, 'y': 0}, {'x': 1, 'y': 0}], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}], [], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}, {'x': 1, 'y': 0}, {'x': 1, 'y': 1}, {'x': 2, 'y': 0}, {'x': 2, 'y': 1}]]
diff = c['y'] - c['x']
#diff= [[6, 5], [0, 2], [], [4, 5, 3, 4, 2, 3]]
cut = diff < 4
#cut= [[False, False], [True, True], [], [False, False, True, False, True, True]]
new = c[cut]
#new= [[], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 5, 'y': 8}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]
new_i = i[cut]
#new_i= [[], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}], [], [{'x': 1, 'y': 0}, {'x': 2, 'y': 0}, {'x': 2, 'y': 1}]]
It is possible that pairs with the same element from a but different elements from b survive the cut. (e.g. {'x': 3, 'y': 3} and {'x': 3, 'y': 5})
My goal is to group those pairs with the same element from a together and therefore reshape the new array into:
new = [[], [{'x': 3, 'y': [3, 5]}], [], [{'x': 5, 'y': 8}, {'x': 6, 'y': [8, 9]}]]
My only idea how to achieve this is to create a list of the indexes from a that are still present after the cut by using new_i:
i = new_i['x']
#i= [[], [0, 0], [], [1, 2, 2]]
However, I need a unique version of this list to make every index appear only once. This could be achieved with np.unique() in NumPy. But doesn't work in awkward1:
np.unique(i)
<__array_function__ internals> in unique(*args, **kwargs)
TypeError: no implementation found for 'numpy.unique' on types that implement __array_function__: [<class 'awkward1.highlevel.Array'>]
My question:
Is their a np.unique() equivalent in awkward1 and/or would you recommend a different approach to my problem?
Okay, I still don't know how to use
np.unique()on my arrays, but I found a solution for my own problem:In my previous approach I used the following code to pair up booth arrays.
However, with the
nested = Trueparameter fromak.cartesian()I get a list grouped by the elements ofa:After the cut I end up with:
I extract the
yvalues and reduce the most inner layer of the nested lists ofnewto only one element:(I tried to use
ak.firsts()withaxis = -1but it seems to be not implemented yet.)Now every most inner entry in
newbelongs to exactly one element froma. By replacing the currentyofnewwith the previously extractedyI end up with my desired result:Anyway, should you know a better solution, I'd be pleased to hear it.