Dask task stream finishing much faster than actual computation

21 Views Asked by At

I am attempting to run a process in parallel using dask.bag, but the process is taking longer than the task stream seems to suggest.

  • I am on dask version 2023.9.3
  • I am on a single machine
        start = time.time()
        def combine_shader_polygons(i):
            shader_polygon = None
            shader_indices = np.flatnonzero(shading_candidates_np[i])
            if len(shader_indices) == 0:
                pass
            elif len(shader_indices) == 1:
                shader_polygon = reference_gdf.loc[shader_indices].iloc[0]
            else:
                polygons = reference_gdf.loc[shader_indices]
                shader_polygon = polygons.unary_union
            return shader_polygon

        shader_polygons = bag.map(combine_shader_polygons).compute(scheduler='processes')
        timer = round(time.time() - start, 2)
        print(f'Checkpoint 1: {timer}s')

As you can see in the task stream image below, the process from start to finish takes around 350ms. But the print statement returns 5.3s. Is there a way to see what is taking up the rest of the time?

Dask Task Stream

0

There are 0 best solutions below