How to further partition/split an already partitioned dask dataframe

26 Views Asked by Ste At 29 July 2025 at 01:15

I have a very large DataFrame, that I partition based on values in one column "A" using the dask.DataFrame.set_index() method. Such N partitions are still too large to fit into memory when mapping a function "f()" on the dask DataFrame dd. I would like to further split/partition each of these N partitions in, say, m smaller DataFrames (can be of equal size, or not). This should allow me to dd.map_partitions(f) in an optimal way, given the resources on my cluster.

I tried using the repartition() method on the partitioned dd, but I am either stuck with the N partitions, or ending up with 10 partitions with mixed values of A (which isn't compatible with how my function f works). One idea would be to dd.map_partitions(repartition, 10) to apply repartition on each df within dd, but that seems quite convoluted. Any (better) suggestions? Thanks! p.s.: I am on my phone and can't easily paste a template case, will do later if needed.

Original Q&A

How to further partition/split an already partitioned dask dataframe

There are 0 best solutions below

Related Questions in DATAFRAME

Related Questions in CLUSTER-COMPUTING

Related Questions in DASK

Related Questions in DATA-PARTITIONING

Trending Questions

Popular # Hahtags

Popular Questions