I have a problem when trying to plot 2d histogram or graph with different length of jagged arrays.
Here is a simple example. Suppose there are 7 events of gen-level pT and its Et.
pT = [ [46.8], [31.7], [21], [29.9], [13.9], [41.2], [15.7] ]
Et = [ [41.4], [25.5, 20], [19.6], [27.4], [12, 3.47], [37.8], [10] ]
Here, some events (2nd, 5th) have two y values corresponding one x value. I want to make graph or 2d histogram putting x = pt and y = et, and put two points together. i.e (31.7, 25.5) and (31.7, 20)
How can I make align these values for plotting?
What you want to do is "broadcast" the two arrays:
Awkward broadcasting is a generalization of NumPy broadcasting to include variable-length lists.
Broadcasting usually happens automatically when you're performing a mathematical calculation:
but you can also do it manually:
When the two arrays have different depths (different "dimensions" in NumPy terminology), scalars from one are replicated to align with all elements of lists in the other.
You have two lists of the same depth:
To manually broadcast them, you could reduce the depth of
pT
by taking the first element from each list.Then you can broadcast each scalar of
pT
into each list ofEt
.This will be more clear if I print them in their entirety by turning them into Python lists:
Now you see that the
31.7
has been duplicated to align with each value in[25.5, 20.0]
.In NumPy, you'll often see examples of broadcasting a dimension of length 1, rather than creating a dimension, like this:
Awkward Array follows this rule, but only if the dimension has length "exactly 1," not "a bunch of variable-length lists that happen to each have length 1." The way I've written
pT
, it has the latter:Since these lists are in-principle variable, they don't broadcast the way that length-1 NumPy arrays would.
If you explicitly cast the array as NumPy, it will have regular types. (Note to self: it would be nice to have a way to turn a variable-length dimension regular or vice-versa without converting the whole array to NumPy.)
So an alternate way to get the same broadcasting is to convert
pT
to NumPy, instead of picking out the first element of each list withpT[:, 0]
.Either way, an assumption is being made that
pT
consists of lists of length 1. ThepT[:, 0]
expression assumes this because it requires something to have index0
in each list (so the length is at least 1) and it ignores whatever else might be there. Theak.to_numpy(pT)
expression will raise an exception if thepT
array doesn't happen to be regular, a shape that can be expressed in NumPy.Now that you have
pT_broadcasted
andEt
aligned with the same structure, you'll have to flatten them both to pass them to a plotting routine (which expects non-jagged data).The plotting routine will probably try
np.asarray
on each of these, which is identical toak.to_numpy
, which will work because these flattened arrays are regular. If you have doubly jagged data or something more complex, you'd have to flatten more.