Creating a new Awkward array from indices

285 Views Asked by At

The problem I am facing is to create a new array from a set of indices. Namely, I have a set of particles and jets, for each jet there is a list of indices of which particles belong to the given jet.

    import awkward as ak
    import vector

    particles = ak.Array({
        "mass": ak.Array([1, 2, 3, 4, 5]),
        "px": ak.Array([1, 2, 3, 4, 5]),
        "py": ak.Array([1, 2, 3, 4, 5]),
        "pz": ak.Array([1, 2, 3, 4, 5]),
    })

    particles_p4 = vector.awk(ak.zip({
        "mass": particles["mass"],
        "x": particles["px"],
        "y": particles["py"],
        "z": particles["pz"]
    }))

    jet = ak.Array({
        "constituents": ak.Array([[1, 2, 4, 5], [3]]),
        "energy": ak.Array([1.2, 3.4])
    })

What I would like to get is for each jet the particle_p4 values in a new array like so:

<Array [[{x: 1, y: 1, z: 1, ... z: 4, tau: 4}]] type='2 * var * {"x": int64, "y"...'>

where the first element of that would be:

<Array [{x: 1, y: 1, z: 1, ... z: 5, tau: 5}] type='4 * {"x": int64, "y": int64,...'>

Doing this with a for loop is trivial, however I have the notion that this can be done in a more efficient way with the tools available in Awkward array.

Bonus: What about even more nested arrays, for example where each event has multiple jets?

1

There are 1 best solutions below

0
On BEST ANSWER

First, I think you mean you have

jet = ak.Array({
    "constituents": ak.Array([[0, 1, 3, 4], [2]]),
    "energy": ak.Array([1.2, 3.4])
})

because I'd expect the "constituents" indexes to be 0-based, not 1-based. But even if it is 1-based, just start by subtracting 1.

>>> jet.constituents - 1
<Array [[0, 1, 3, 4], [2]] type='2 * var * int64'>

The biggest problem here is that these indexes are nested one level deeper than the particles_p4 that you want to slice. You want the 0, 1, 3, 4, and also the 2, in your jet.constituents to be indexes in the not-nested list, particles_p4.

If we just arbitrarily flatten them (axis=-1 means to squash the last/deepest dimension):

>>> ak.flatten(jet.constituents, axis=-1)
<Array [0, 1, 3, 4, 2] type='5 * int64'>

these indexes are exactly what you'd need to apply to particles_p4. Here, I'm using the current (2.x) version of Awkward Array, so that I can use .show(), but the integer-array slice works in any version of Awkward Array.

>>> particles_p4[ak.flatten(jet.constituents, axis=-1)].show(type=True)
type: 5 * Momentum4D[
    x: int64,
    y: int64,
    z: int64,
    tau: int64
]
[{x: 1, y: 1, z: 1, tau: 1},
 {x: 2, y: 2, z: 2, tau: 2},
 {x: 4, y: 4, z: 4, tau: 4},
 {x: 5, y: 5, z: 5, tau: 5},
 {x: 3, y: 3, z: 3, tau: 3}]

If we take that as a partial solution, all we need to do now is put the nested structure back into the result.

ak.flatten has an opposite, ak.unflatten, which takes a flat array and adds nestedness from an array of list lengths. You can get the list lengths from the original jet.constituents with ak.num. Again, I'll use axis=-1 so that this answer will generalize to deeper nestings.

>>> lengths = ak.num(jet.constituents, axis=-1)
>>> lengths
<Array [4, 1] type='2 * int64'>

>>> rearranged = particles_p4[ak.flatten(jet.constituents, axis=-1)]
>>> rearranged
<MomentumArray4D [{x: 1, y: 1, z: 1, tau: 1}, ..., {...}] type='5 * Momentu...'>

>>> result = ak.unflatten(rearranged, lengths, axis=-1)
>>> result.show(type=True)
type: 2 * var * Momentum4D[
    x: int64,
    y: int64,
    z: int64,
    tau: int64
]
[[{x: 1, y: 1, z: 1, tau: 1}, {x: 2, ...}, ..., {x: 5, y: 5, z: 5, tau: 5}],
 [{x: 3, y: 3, z: 3, tau: 3}]]

For the bonus round, if all of the above arrays (particles_p4 and jet) were arrays of lists, where each list represents one event, rather than an array representing one event, then the above would hold. I'm taking it as a given that the length of the particles_p4_by_event is equal to the length of the jet_by_event arrays, and the values of jet_by_event.constituents are indexes within each event in particles_p4_by_event (not global indexes; each event should restart at zero). That is, all of your arrays agree on how many events there are, and each event is handled individually, with no cross-over between events.