I have a batch script that I eventually want to execute on a cluster via condor_submit. The script needs to load some module via "module load matlab/R2020a". However nothing works.
The script looks like this:
#!/bin/bash
module load cudnn/8.2.0-cu11.x
module load cuda/11.2
module load matlab/R2020a
echo $PATH
echo $SHELL
#Check matlab version
echo_and_run() { echo "$*" ; "$@" ; }
matlab -e | grep "MATLAB="
echo_and_run matlab -e | grep "MATLAB="
...
setup input etc.
...
echo_and_run matlab -nodisplay -batch "......matlab commands"
When I run it from my home shell it gives me:
...
/bin/bash
MATLAB=/is/software/matlab/linux/R2014a
MATLAB=/is/software/matlab/linux/R2014a
...
Which is both not correct. When executing this in my local shell ( source ./scriptname.sh) The output is even more confusing:
...
/bin/bash
MATLAB=/is/software/matlab/linux/R2020a
MATLAB=/is/software/matlab/linux/R2014a
...
So the matlab version updates, but only for the non "echo_and_run" execution (the first call). In the actual call it also is the default 2014 version.
What in the world is going on? I checked the $PATH variables and they are identically to my running shell. I tried sourcing ~/.bashrc at the top of the script, no difference. When I type "type module" I can see that it is a function:
module is a function
module ()
{
_module_raw "$@" 2>&1
}
Some older posts mention, that I should either run with "source" or "." (for sh) but I cannot do that, since the script is called by condor_submit eventually. Or I should find the file that defines module. However I do not know what other file (besides ~/.bashrc) that could be.
Edit
I am currently trying everything locally (in the login shell) which could be different from the execution shell, but even here I get this weird behaviour.
Edit II:
+ type -a _module_raw
_module_raw is a function
_module_raw ()
{
unset _mlshdbg;
if [ "${MODULES_SILENT_SHELL_DEBUG:-0}" = '1' ]; then
case "$-" in
*v*x*)
set +vx;
_mlshdbg='vx'
;;
*v*)
set +v;
_mlshdbg='v'
;;
*x*)
set +x;
_mlshdbg='x'
;;
*)
_mlshdbg=''
;;
esac;
fi;
unset _mlre _mlIFS;
if [ -n "${IFS+x}" ]; then
_mlIFS=$IFS;
fi;
IFS=' ';
for _mlv in ${MODULES_RUN_QUARANTINE:-};
do
if [ "${_mlv}" = "${_mlv##*[!A-Za-z0-9_]}" -a "${_mlv}" = "${_mlv#[0-9]}" ]; then
if [ -n "`eval 'echo ${'$_mlv'+x}'`" ]; then
_mlre="${_mlre:-}${_mlv}_modquar='`eval 'echo ${'$_mlv'}'`' ";
fi;
_mlrv="MODULES_RUNENV_${_mlv}";
_mlre="${_mlre:-}${_mlv}='`eval 'echo ${'$_mlrv':-}'`' ";
fi;
done;
if [ -n "${_mlre:-}" ]; then
eval `eval ${_mlre}/usr/bin/tclsh8.6 /usr/lib/x86_64-linux-gnu/modulecmd.tcl bash '"$@"'`;
else
eval `/usr/bin/tclsh8.6 /usr/lib/x86_64-linux-gnu/modulecmd.tcl bash "$@"`;
fi;
_mlstatus=$?;
if [ -n "${_mlIFS+x}" ]; then
IFS=$_mlIFS;
else
unset IFS;
fi;
unset _mlre _mlv _mlrv _mlIFS;
if [ -n "${_mlshdbg:-}" ]; then
set -$_mlshdbg;
fi;
unset _mlshdbg;
return $_mlstatus
}
Provided script output does not help to understand if there is an issue at the environment modules level.
Adding a
module listcommand in your script after the 3module loadcommands may help to determine if themodulefunction has properly loaded your environment or not.In some situations, like when running script on a cluster through a batch scheduler, it is good to source the module initialization script at the start of such script to ensure the
modulefunction is defined. It seems that you are running on a Debian-like system, so the initialization script may be sourced with: