I am using torch distributed in my code. I run it using the torchrun command from my terminal. I want to profile it using the scalene profiler.
Sample torchrun run command:
bash torchrun --nnodes 1 --nproc_per_node 6 --standalone main.py --train
Sample scalene profiling command:
bash scalene --no-browser --reduced-profile --cpu --outfile profile_rnd00_pong_5│ fig_path=./configs/PongTuning/config_rnd00.conf --log_name=PongTuning_rnd0 hr_teslaT4_test00.html --profile-interval 120 main.py --train
I tried combining these two as follows but it does not work:
bash scalene --no-browser --reduced-profile --cpu --outfile profile_rnd00_pong_5│ fig_path=./configs/PongTuning/config_rnd00.conf --log_name=PongTuning_rnd0 hr_teslaT4_test00.html --profile-interval 120 torchrun --nnodes 1 --nproc_per_node 6 --standalone main.py --train
Is there a way to use scalene while also depending on torchrun to run my distributed pytorch code.
I have also tried the following way:
python -m scalene --- -m torch.distributed.run --nnodes 1 --nproc_per_node 6 --standalone main.py --train
and it raised the following error:
Scalene: Program did not run for long enough to profile.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local
_rank: 0 (pid: 136598) of binary: /tmp/scalenelcqus7e6/python
Error in program being profiled:
Note: main.py uses argparse and accepts --train as one of its options.
I have also looked at pytorch profiler but it doesn't seem to help me. It does not profile non-pytorch related parts of my code. I need to profile other parts of my code to optimize such as the use cases of python arrays and conversions between object types.
I really appreaciate your help. Thank you.