I'm a bit new to docker and trying to simulate a cluster environment with it. I have defined a custom docker network that the containers share, and assign each container to a different port to simulate different network cards.
Currently, I have a working Dockerfile that copies over the needed ssh keys and I automatically have it start the ssh server with ENTRYPOINT service ssh start && bash.
Right now, my containers work, but the inconvenience is that when the containers start I have to manually run eval ssh-agent && ssh-add /.ssh/docker_id_rsa, then manually ssh into all the other containers, and then I am able to run my MPI program. If I don't do these steps first, I am not able to run the program across the containers.
So what I'd like to do is when I attach to one of the containers, I want to either (1) immediately run my MPI program across all of the containers without having to run all the steps I mentioned above, or (2) even just immediately ssh into the other containers, and then run my program.
Here is an example of my current Dockerfile:
FROM img_base AS img
COPY /keys/ /root/.ssh
COPY /keys/docker_id_rsa.pub /root/.ssh/authorized_keys
RUN sed -i 's/#PermitRootLogin no/PermitRootLogin yes/g' /etc/ssh/sshd_config
RUN sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
RUN sed -i "s+StrictHostKeyChecking .*+StrictHostKeyChecking allow-new+" /etc/ssh/sshd_config
RUN echo "localhost" >> hostfile
RUN echo "root@container2" >> hostfile
RUN echo "root@container3" >> hostfile
RUN echo "root@container4" >> hostfile
EXPOSE 22
ENTRYPOINT service ssh start && bash && eval `ssh-agent` && ssh-add /root/.ssh/docker_id_rsa
I start my containers with the following bash script:
#!/bin/bash
docker run --rm -dit --name container1 --network=my-net --ip=172.18.0.2 -p 4022:22 --add-host container2:172.18.0.3 --add-host container3:172.18.0.4 --add-host container4:172.18.0.5 img
docker run --rm -dit --name container2 --network=my-net --ip=172.18.0.3 -p 3022:22 --add-host container1:172.18.0.2 --add-host container3:172.18.0.4 --add-host container4:172.18.0.5 img
docker run --rm -dit --name container3 --network=my-net --ip=172.18.0.4 -p 5022:22 --add-host container2:172.18.0.3 --add-host container1:172.18.0.2 --add-host container4:172.18.0.5 img
docker run --rm -dit --name container4 --network=my-net --ip=172.18.0.5 -p 6022:22 --add-host container2:172.18.0.3 --add-host container3:172.18.0.4 --add-host container1:172.18.0.2 img
docker attach container1
I have tried adding the eval and ssh-add commands in the ENTRYPOINT command.
I've also tried adding these commands to the docker run commands in the bash script.
And I've tried to do this with a docker-compose file but still do not really understand how to use the docker-compose functionalities
Any advice or references on the proper way to do this is greatly appreciated.
I'm not sure what your
img_baselooks like but I'll just assume that it's an Ubuntu image (or a derivative).You are setting up SSH access to the containers as the
rootuser. This is not ideal but 100% fine to get things up and running. Perhaps change to a non-privileged user later?DockerfileTesting the image. Connecting port 2022 on the host to avoid conflict with SSHD running on host.
SSH connection confirmed. ✅
Now let's get this working with Docker Compose.
Dockerfiledocker-compose.ymlThe
container1service is slightly different because it runs thesetup.shscript. This script (see below) will run code on the other three containers via SSH. So you can use this to set up all of the containers. For the moment though it just prints a message on each of the containers.setup.shLaunch.
So
container1is effectively acting as the master and setting things up on the other containers.