The problem I am facing is to create a new array from a set of indices. Namely, I have a set of particles and jets, for each jet there is a list of indices of which particles belong to the given jet.
import awkward as ak
import vector
particles = ak.Array({
"mass": ak.Array([1, 2, 3, 4, 5]),
"px": ak.Array([1, 2, 3, 4, 5]),
"py": ak.Array([1, 2, 3, 4, 5]),
"pz": ak.Array([1, 2, 3, 4, 5]),
})
particles_p4 = vector.awk(ak.zip({
"mass": particles["mass"],
"x": particles["px"],
"y": particles["py"],
"z": particles["pz"]
}))
jet = ak.Array({
"constituents": ak.Array([[1, 2, 4, 5], [3]]),
"energy": ak.Array([1.2, 3.4])
})
What I would like to get is for each jet the particle_p4 values in a new array like so:
<Array [[{x: 1, y: 1, z: 1, ... z: 4, tau: 4}]] type='2 * var * {"x": int64, "y"...'>
where the first element of that would be:
<Array [{x: 1, y: 1, z: 1, ... z: 5, tau: 5}] type='4 * {"x": int64, "y": int64,...'>
Doing this with a for loop is trivial, however I have the notion that this can be done in a more efficient way with the tools available in Awkward array.
Bonus: What about even more nested arrays, for example where each event has multiple jets?
First, I think you mean you have
because I'd expect the
"constituents"
indexes to be 0-based, not 1-based. But even if it is 1-based, just start by subtracting 1.The biggest problem here is that these indexes are nested one level deeper than the
particles_p4
that you want to slice. You want the0
,1
,3
,4
, and also the2
, in yourjet.constituents
to be indexes in the not-nested list,particles_p4
.If we just arbitrarily flatten them (
axis=-1
means to squash the last/deepest dimension):these indexes are exactly what you'd need to apply to
particles_p4
. Here, I'm using the current (2.x) version of Awkward Array, so that I can use.show()
, but the integer-array slice works in any version of Awkward Array.If we take that as a partial solution, all we need to do now is put the nested structure back into the result.
ak.flatten has an opposite, ak.unflatten, which takes a flat array and adds nestedness from an array of list lengths. You can get the list lengths from the original
jet.constituents
with ak.num. Again, I'll useaxis=-1
so that this answer will generalize to deeper nestings.For the bonus round, if all of the above arrays (
particles_p4
andjet
) were arrays of lists, where each list represents one event, rather than an array representing one event, then the above would hold. I'm taking it as a given that the length of theparticles_p4_by_event
is equal to the length of thejet_by_event
arrays, and the values ofjet_by_event.constituents
are indexes within each event inparticles_p4_by_event
(not global indexes; each event should restart at zero). That is, all of your arrays agree on how many events there are, and each event is handled individually, with no cross-over between events.