Get PARTITION_ID in Dask for Data Frame

267 Views Asked by At

Is it possible to get the partition_id in dask after splitting pandas DFs

For example:

import dask.dataframe as dd
import pandas as pd
df = pd.DataFrame(np.random.randn(10,2), columns=["A","B"])
df_parts = dd.from_pandas(df, npartitions=2)
part1 = df_parts.get_partition(0)

In the 2 parts, part1 is the first_partition. So is it possible to do something like the following:

part1.get_partition_id() => which will return 0 or 1

Or is it possible to get the partition ID by iterating through df_parts?

1

There are 1 best solutions below

0
On BEST ANSWER

Not sure about built-in functions, but you can achieve what you want with enumerate(df_parts.to_delayed()).

to_delayed will produce a list of delayed objects, one per partition, so you can iterate over them, keeping track of the sequential number with enumerate.