Singularity behaviour: shell vs exec

1.9k Views Asked by At

So I'm trying to debug an error I got on an HPC setup I have access to. I won't go into details about the error since it's package specific and I'm pretty sure this is an environment variable kind of problem. That said the package is neuron, and if anyone has experience with it and singularity I would appreciate your input.

When I tested everything locally using:

singularity exec --bind ./:/mnt container.sif my_script.py 

there were no problems. However the same command ran into an error on the HPC cluster. I set about trying to recreate the error locally to see what the problem was.

For reasons still unknown to me, the error I got on the cluster can be reproduced locally by adding the --containall flag to the exec command. In fact, even the --contain flag can reproduce the error. I can see from the docs that --contain will:

use minimal /dev and empty other directories (e.g. /tmp and $HOME) instead of sharing filesystems from your host

which makes me guess its a path/environment problem, but I'm not 100% sure since I am still new-ish to everything that isn't python.

In order to try and solve the problem I tried using singularity shell to recreate the error. And this is where I hope someone can elucidate matters for me. If I do this:

singularity shell --containall --bind ./:/mnt container.sif
cd /mnt
python3 my_script.py

The script runs fine, I get no errors. However when I run:

singularity exec --containall --bind ./:/mnt container.sif python3 /mnt/my_script.py

I get the same error as I got on the cluster.

What is different about these two approaches? Why might shelling into the container work, and executing it like this not work? I am just looking for help figuring out to debug this.

Additionally, why might the scripts run locally but not on the HPC? My understanding of containers is that they are supposed to allow scripts to be run on different systems because everything is well, contained, in the container. What am I allowing through in these different scenarios that's stopping me from running my code?

My instincts (which aren't exactly experienced) tell me that there is some environment variable that I am carrying through when I shell in (or when I run the scripts locally) that I am losing when I run it in the other ways, but I am not sure where to begin looking for such a thing, or how to keep it in the container.

EDIT:

I also just tried shelling into the container, while in the HPC, and I get the same error. So there's something on my local machine that is being used when I shell in or when I execute the script without the --contain flag

Versions:

  • Singularity 3.5
  • Python 3.6.9
  • NEURON 8.0
1

There are 1 best solutions below

0
On

Sounds like environment issue: you have something set in your dev env that doesn't exist in your cluster env. By default, all your environment variables are automatically forwarded on to the singularity environment. I recommend using -e/--cleanenv to catch that. When using that, only variables prefixed with SINGULARITYENV_ set in the singularity environment. e.g., to have NEURON_HOME=/mnt/neuron you would use export SINGULARITYENV_NEURON_HOME=/mnt/neuron before running the singularity command.

Once you figure out what the variable to be updated is you can add it normally in %environment or %post, however you prefer. If it's a value that changes depending on the environment, you can export the value in SINGULARITYENV_VARNAME.