Multiple progress bars with Python multiprocessing

475 Views Asked by At

I use a FEA software with a dedicated "wrapper" (let's call it sw) providing an "opaque" model handler in the following way model = sw.Load("myfile.ext").

To keep track of the computation progress, the wrapper allows us to attach a custom "progress handler function" in the form def progressHandlerFn(model_name, progress) that we simply need to assign to the model with the syntax:

model.progressHanlderFunction = progressHandlerFn

We can then run the model via the command model.Run().

During the simulation, our custom function is called regularly (automatically by the software) and we can do basic operations such as logging or printing the progress.

To handle multiple runs, I am using the multiprocessing package. In that context, I would like to be able to track all my simulations with a progress bar. I have tried several packages, namely progress, alive-progress, rich, tqdm and atpbar. All of them seem to struggle with the parallelization introduced by multiprocessing.

The main struggle is: most of these libs are using the syntax with bar() as bar which does not match very well with parallelization (as far as my knowledge goes).

The closest I could get was with the "manual" usage of tqdm which allows initizalizing and updating bars manually. Still it does not work as expected (see code below).

import sw
import multiprocessing

PARALLEL_RUNS = 2
names_to_bar = {}

def progressHandlerFn(model_name, progress):
    if model_name not in names_to_bar.keys():
        names_to_bar[model_name] = {"bar": tqdm(total=100), "last_progress": 0}
    progress = 100 * (time - start) / (stop - start)
    names_to_bar[model_name]["bar"].update(progress - names_to_bar[model_name]["last_progress"])
    names_to_bar[model_name]["last_progress"] += 1

def run_simulation(file):
    print(f"Loading and running {file}")
    m = sw.Load(file)
    m.progressHanlderFunction = progressHandlerFn
    m.Run()

def run_simulations(files, parallel_runs):
    with multiprocessing.Pool(parallel_runs) as pool:
        pool.map(run_simulation, files)

if __name__ == "__main__":
    run_simulations(model_files, PARALLEL_RUNS)
    for pbar in names_to_bar.values():
        pbar.close()

When I do that, I obtain one single line in the command line interface, which is "erased" alternatively by the first model and the second model progressions. Basically, I expect to see two independent progress bars, but I only get one.

By the way, I noticed afterward that the names_to_bar variable is likely not shared among pools (because that's how multiprocessing works) but that should not really be an issue here, as I think each pool will get its own dict (with only one entry).

My main issue, as you may have guessed, is that I don't have the hand on the progressHandlerFn arguments so I cannot add a variable I would have set in a level above...

1

There are 1 best solutions below

1
On

I'd recommend managing all your progress bars in the main process.

There's an example in the Enlighten repo where the workers simply put their count into a queue and the main process loops over the worker queues to update the progress bars. Should be easy to adapt for your use case.

https://github.com/Rockhopper-Technologies/enlighten/blob/main/examples/multiprocessing_queues.py

In this example, each worker processes one system, and the number of systems is between 10 and 20, but the workers are pinned at 4. Each worker has it's own progress bar and there is a separate bar to track the overall progress. There are 3 colors in the main progress bar, yellow for started, green for completed, and red for error.

enter image description here