Issues with Docker Swarm running TeamCity using rexray/ebs for drive persistence in AWS EBS

402 Views Asked by At

I'm quite new to Docker but have started thinking about production set-ups, hence needing to crack the challenge of data persistence when using Docker Swarm. I decided to start by creating my deployment infrastructure (TeamCity for builds and NuGet plus the "registry" [https://hub.docker.com/_/registry/] for storing images).

I've started with TeamCity. Obvious this needs data persistence in order to work. I am able to run TeamCity in a container with an EBS drive and everything looks like it is working just fine - TeamCity is working through the set-up steps and my TeamCity drives appear in AWS EBS, but then the worker node TeamCity gets allocated to shuts down and the install process stops.

Here are all the steps I'm following:

Phase 1 - Machine Setup:

Phase 2 - Configure Docker Remote on the Master:

$ sudo docker run -p 2375:2375 --rm -d -v /var/run/docker.sock:/var/run/docker.sock jarkt/docker-remote-api

Phase 3 - install the rexray/ebs plugin on all machines:

$ sudo docker plugin install --grant-all-permissions rexray/ebs REXRAY_PREEMPT=true EBS_ACCESSKEY=XXX EBS_SECRETKEY=YYY

[I lifted the correct values from AWS for XXX and YYY]

  • I test this using:

    $ sudo docker volume create --driver=rexray/ebs --name=delete --opt=size=2

    $ sudo docker volume rm delete

  • All three nodes are able to create and delete drives in AWS EBS with no issue.

Phase 4 - Setup the swarm:

  • Run this on the master:

    $ sudo docker swarm init --advertise-addr eth0:2377

  • This gives the command to run on each of the workers, which looks like this:

    $ sudo docker swarm join --token XXX 1.2.3.4:2377

  • These execute fine on the worker machines.

Phase 5 - Set up visualisation using Remote Powershell on my local machine:

$ $env:DOCKER_HOST="{master IP address}:2375"

$ docker stack deploy --with-registry-auth -c viz.yml viz

viz.yml looks like this:

version: '3.1'

services:
    viz:
        image: dockersamples/visualizer
        volumes:
            - "/var/run/docker.sock:/var/run/docker.sock"
        ports:
            - "8080:8080"
        deploy:
            placement:
                constraints:
                    - node.role==manager
  • This works fine and allows me to visualise my swarm.

Phase 6 - Install TeamCity using Remote Powershell on my local machine:

$ docker stack deploy --with-registry-auth -c docker-compose.yml infra

docker-compose.yml looks like this:

version: '3'

services:
  teamcity:
    image: jetbrains/teamcity-server:2017.1.2
    volumes:
        - teamcity-server-datadir:/data/teamcity_server/datadir
        - teamcity-server-logs:/opt/teamcity/logs
    ports:
        - "80:8111"

volumes:
  teamcity-server-datadir:
   driver: rexray/ebs
  teamcity-server-logs:
   driver: rexray/ebs
  • [Incorporating NGINX as a proxy is a later step on my to do list.]

  • I can see both the required drives appear in AWS EBS and the container appear in my swarm visualisation.

  • However, after a while of seeing the progress screen in TeamCity the worker machine containing the TeamCity instance shuts down and the process abruptly ends.

  • I'm at a loss as to what to do next. I'm not even sure where to look for logs.

Any help gratefully received!

Cheers,

Steve.

3

There are 3 best solutions below

0
On

I found a way to get logs for my service. First do this to list the services the stack creates:

$ sudo docker service ls 

Then do this to see logs for the service:

$ sudo docker service logs --details {service name}

Now I just need to wade through the logs and see what went wrong...

0
On

Update

I found the following error in the logs:

infra_teamcity.1.bhiwz74gnuio@ip-172-31-18-103    |  [2018-05-14 17:38:56,849]  ERROR - r.configs.dsl.DslPluginManager - DSL plugin compilation failed
infra_teamcity.1.bhiwz74gnuio@ip-172-31-18-103    |  exit code: 1
infra_teamcity.1.bhiwz74gnuio@ip-172-31-18-103    |  stdout: #
infra_teamcity.1.bhiwz74gnuio@ip-172-31-18-103    |  # There is insufficient memory for the Java Runtime Environment to continue.
infra_teamcity.1.bhiwz74gnuio@ip-172-31-18-103    |  # Native memory allocation (mmap) failed to map 42012672 bytes for committing reserved memory.
infra_teamcity.1.bhiwz74gnuio@ip-172-31-18-103    |  # An error report file with more information is saved as:
infra_teamcity.1.bhiwz74gnuio@ip-172-31-18-103    |  # /opt/teamcity/bin/hs_err_pid125.log
infra_teamcity.1.bhiwz74gnuio@ip-172-31-18-103    |
infra_teamcity.1.bhiwz74gnuio@ip-172-31-18-103    |  stderr: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000e2dfe000, 42012672, 0) failed; error='Cannot allocate memory' (errno=12)

Which is making me think this is a memory problem. I'm going to try this again with a better AWS instance and see how I get on.

0
On

Update 2

Using a larger AWS instance solved the issue. :)

I then discovered that rexray/ebs doesn't like it when a container switches between hosts in my swarm - it duplicates the EBS volumes so that it keeps one per machine. My solution to this was to use an EFS drive in AWS and mount it to each possible host. I then updated the fstab file so that the drive is remounted on every reboot. Job done. Now to look into using a reverse proxy...