Progress Bar not displaying in GNU Parallel with SLURM script

21 Views Asked by At

I am new to GNU Parallel and am trying to run a few simulations. I have a bash script which I am submitting to a cluster via SLURM. The script is given below. Essentially, the parallel calls a function run_simulation, which will call bash scripts inside it. The bash scripts generate output in the current directory, which is different for each job.

#!/bin/bash
# Job name:
#SBATCH --job-name=Run_MD_Sim
#
# Account:
#SBATCH --account=fc_mllam
#
# Partition:
#SBATCH --partition=savio3
#
# Request one node:
#SBATCH --nodes=1
#
# Specify number of tasks for use case (example):
#
#
# Processors per task:
#SBATCH --cpus-per-task=2
#
# Wall clock limit:
#SBATCH --time=5:00:30
#
## Command(s) to run (example):

module load intel
module load openmpi
module load gcc
module load cmake
module load gnu-parallel/2019.03.22

energy_list=("90")
fluence_list=("1000")

len_energy=${#energy_list[@]}
len_fluence=${#fluence_list[@]}

# Change this line if number of nodes requested is changed
val="ALE_Cycle_Run_2.sh"

# Function to run MD simulation for a single combination of energy and fluence
run_simulation() {
    enval="$1"
    flval="$2"
    counter="$3"
    val="$4"
    
    # Create a directory to carry out computations. If node=1, then we are in fcmd_bondorder
    mkdir "Temp_Directory_$counter"
    
    # Check if using more than one node. If more than one node is used, then working directory will be the home directory. Below lines will change
    cp ../temp_000588-322.cfg "Temp_Directory_$counter/temp_000000-000.cfg"

    # Copy simulation files into this folder
    cp *.o "Temp_Directory_$counter/"
    cp *.cpp "Temp_Directory_$counter/"
    cp *.h "Temp_Directory_$counter/"
    cp Makefile "Temp_Directory_$counter/"
    cp "$val" "Temp_Directory_$counter/"
    cp Bond_Param_Gen.sh "Temp_Directory_$counter/Bond_Param_Gen.sh"

    # Change directory to temporary directory
    cd "Temp_Directory_$counter"
    
    # Run the main MD simulation. The output will be stored in the current directory
    bash "$val" "$enval" "$flval"

    # Make directory to store the bond-order files
    mkdir Data/
    mv *.txt Data/
    rm Data/*.txt
    bash Bond_Param_Gen.sh
    mv *.txt Data/
    mv *.cfg Data/

    # Home directory or scratch directory
    directory="/global/home/users/shoubhaniknath"
    new_filename="Data ${flval} impacts energy ${enval} number ${counter}"

    # Rename and move the data folder
    mv "Data" "$directory/$new_filename"
}

# Export the function so that GNU Parallel can access it
export -f run_simulation

# Set number of jobs based on number of cores available and number of threads per core
export JOBS_PER_NODE=$(( $SLURM_CPUS_ON_NODE / $SLURM_CPUS_PER_TASK ))

# Run simulations in parallel
for enval in "${energy_list[@]}"; do
    for flval in "${fluence_list[@]}"; do
        # Use GNU Parallel to parallelize the loop over 'counter'
    # Use below line for multiple nodes
    #  parallel --dry-run --jobs $JOBS_PER_NODE --slf hostfile run_simulation "$enval" "$flval" {} "$val" ::: {1..3}
    # For single node, use below line
    echo $JOBS_PER_NODE
    parallel --jobs $JOBS_PER_NODE --joblog task.log --resume --bar run_simulation "$enval" "$flval" {} "$val" ::: {1..3}
    done
done

My issue is that I am not able to print the progress bar of the parallel, and have no idea why. Simple parallel commands executed in the current working directory do show the progress bar. What am I doing wrong here?

1

There are 1 best solutions below

1
Ole Tange On

Try something like this:

parallel --jobs $JOBS_PER_NODE --joblog task.log --resume --bar run_simulation {2} {3} {1} "$val" ::: {1..3} ::: "${energy_list[@]}" ::: "${fluence_list[@]}"