I'm thinking of migrating from awkward 1 to 2.
I used lazy reading from parquet file using awkward1.
tree = ak.from_parquet(filename, lazy=True) # data is not read
print(np.max(tree["x"]),end=" ") # "x" is just read here.
How can I realize this with awkward2?
The one major interface change from Awkward 1.x to Awkward 2.x is that lazy arrays (
PartitionedArrayofVirtualArray) have been replaced by dask-awkward. The motivation for this was to give users more control over when the array-reading happens (when you say,.compute(), and not before), as well as where (distributed on any cluster that runs Dask jobs).Here's an example:
This object represents data that have not been read yet. It's a kind of lazy array. It has all of the metadata, such as field names and data types.
You can perform a lazy computation (all
ak.*functions work ondak.Arrays), and it remains lazy, unlike an Awkward 1.x array:But when you say,
.compute(), you get fully evaluated data.See dask.org for distributing and scaling up processes, either on one computer or on a cluster of computers.