I've been building a singularity container to run some python code, and despite reading the singularity docs, I can't understand the errors/behavior.
Firstly the container is Ubuntu18.04 bootstrapped from docker, ie, :
Bootstrap: docker
From: ubuntu:18.04
I need to make use of a python module (neuron), which needs to be compiled etc beforehand. I compile the code in the %post
section of the definition file and added the environment variables:
echo 'export PATH=$PATH:/usr/local/nrn/x86_64/bin' >>$SINGULARITY_ENVIRONMENT
echo 'export LD_LIBRARY_PATH=/usr/local/nrn/x86_64/lib:$LD_LIBRARY_PATH' >>$SINGULARITY_ENVIRONMENT
I can build the container without too many issues(using sudo singularity build --sandbox
). But I've been trying to run a test script (test.py) to make sure everything works as expected. In the script I import the module in question (neuron) and then I just try to save a list to a csv to make sure I could save data properly. So it looks something like this:
import neuron #this fails and gives an unusual error in specific circumstances I don't understand (described below)
import numpy as np
some_data = [1,2,3]
np.savetxt('test_results.csv',np.asarray(some_data),delimiter=',')
Depending on the flags I provide when using singularity exec
I get different results, which I don't understand (or know where to start understanding - is this a neuron, singularity or ubuntu
issue?).
For completeness, the container (and test.py) is inside the same directory I'm running these commands from (so dir in my example). So if I mount $HOME, by not using the --no-home
flag and try to run test.py like this:
singularity exec --writable --bind /home/bidby/path/to/some/dir:/mnt my_container.simg python3 /mnt/test.py
I get an error like this: dlopen failed - x86_64/.libs/libnrnmech.so: undefined symbol: celsius
which I've tried googling a fair bit, and might be a c++ linking error (but I only really know python, so debugging this hasn't been easy).
However, if I use the --no-home
flag, ie,:
singularity exec --no-home --writable --bind /home/bidby/path/to/some/dir:/mnt my_container.simg python3 /mnt/test.py
then the module imports successfully and a new error arises:
Traceback (most recent call last):
File "/mnt/test.py", line 15, in <module>
np.savetxt('test_results.csv',np.asarray(some_data),delimiter=',')
File "/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py", line 1352, in savetxt
open(fname, 'wt').close()
PermissionError: [Errno 13] Permission denied: 'test_results.csv'
I've been googling this continuously for several days now, but I can't figure out what the problem is. From what I've learnt and tested, I figure it might be something to do with how environment variables are passed into the container, although why I don't have permission to save here is beyond me. But I feel this might be resolved if I can understand why using the --no-home
flag affects the module import.
This may not be helpful to solving the problem but other things I've noticed/tried:
If I use the --containall
flag, I can run test.py with no problem, but then the csv file I try to save can never be found. I checked the docs which say:
Using the --containall (or -C for short) flag, $HOME is not mounted and a dummy bind mount is created at the $HOME point. You cannot use -B` (or --bind) to bind your $HOME directory because it creates an empty mount. So if you have files located in the image at /home/user, the --containall flag will hide them all.
and I presume this "dummy bind mount" is where the file is being written to, hence why I can never actually find it.
If I shell into the container, with sudo
and the --writable
flag, I can import neuron without any problem. If I don't use either of those flags, then I get the same "undefined symbol" error from above.
If I don't export the LD_LIBRARY_PATH then I get a different dlopen error referring to a different .so file, saying that the file doesn't exist - this reaffirms my thinking that it's a path problem.
I know I haven't included enough code to reproduce this error, since I'm guessing no one has the time/energy to build this container (since it's fairly large) but I think I've included the most relevant parts. Will be happy to add more if needed though.
Debugging this has been a nightmare for me, and if anyone can point me in the right direction of what I should be googling I would appreciate it a lot.
That certainly looks like an environment problem, but they're even harder to debug remotely than locally. One thing that can help: use the
--cleanenv
option as well as--no-home
. That should make your container environment independent of the host env. This may clear up your issue, or least point to a new one.The write error you show next is coming from your test script where you're using a relative path to the file you're writing. Generally that means it's being written to the directory the script was executed from. Do you have write permissions to that directory or is there an existing file with that name that you don't have write permission to?
Probably unrelated, but: what version of singularity are you using?
.simg
was used in the 2.x docs, while.sif
is generally used for 3.x. If you're still on 2.x, I strongly recommend updating to the most recent 3.x you can. The 2.x is no longer being developed on and most versions < 2.6.1 have security issues. If your cluster admins are being slow to update, pointing this out can help motivate them.