Container Optimized OS Docker Shutdown Behavior

1.4k Views Asked by At

I have been deploying containers on GCP Compute Engine VMs using google's Container Optimized OS. I have been slightly struggling to understand the shutdown behavior of the deployed containers when the host VM is stopped in GCP.

When my containers receive a SIGTERM or SIGINT signal, they perform some cleanup behavior and write some files into mounted volumes. I have tested this extensively with docker stop and docker kill -s SIGINT. However, this behavior doesn't seem to be occurring when I stop the host machine in GCP.

I'm not entirely sure how to debug this process. I tried attaching to the VM's serial console, but it doesn't seem to have any info pertaining to the container shutdown logic.

Any guidance would be very appreciated! For reference, this is the image I am deploying.


Full reproduction steps:

Create a new "Compute Engine" VM with "Deploy a container image to this VM." I have been using an e2 medium with a 20GB boot disk.

Use the "lloesche/valheim-server" image.

Set the following env variables:

SERVER_NAME: Test
WORLD_NAME: Test
SERVER_PASS: Password # must be at least 5 characters

Add a Directory mount of type "Directory" with "/config" as the mount path and "/home/YOUR_GCP_USERNAME/valheim-server-config" as the host path in "Read/write" mode.

After the container starts up, you should have the image running on the host machine (lloesche/valheim-server). You should also have a file created at ~/valheim-server-config/worlds/ called Test.fw1.

Now, stopping this container (docker stop) should cause a write to that file. You can verify this by stopping the container and then observing that file's modified date.

However, this process doesn't seem to be occurring when the host instance is stopped. If you restart the host so the container is again running, then issue a "stop" to the host, that file isn't saved before the container is killed.

3

There are 3 best solutions below

2
Wojtek_B On

I've went through the logs and found nothing that would point me to a solution.

There may be however a workaround for this.

You can use shutdown script to stop your containers more "gracefully" before VM shutdown;

You can provide the script using gcloud command:

gcloud compute instances create example-instance \
    --metadata-from-file shutdown-script=examples/scripts/install.sh

or using console UI:

In Cloud Console, specify a shutdown script directly using the shutdown-script metadata key:

In the Cloud Console, go to the VM instances page. Go to VM instances

Click Create instance. On the Create a new instance page, fill in the properties for your instance. For advanced configuration options, expand the Management, security, disks, networking, sole tenancy section. In the Metadata section, fill in shutdown-script as the metadata key. In the Value box, supply the contents of your shutdown script. Click Create to create the instance.

Ultimately you can create a new issue at Google Issuetracker and explain what you expect (what kind of behavior).

1
Michael Korn On

I had the same problem and I found a workaround (not perfect but works for me). Add as part of your startup-script:

mkdir -p /etc/systemd/system/docker.service.d
printf "[Service]\nExecStop=/bin/sh -c 'docker stop \$(docker ps -q)'" > /etc/systemd/system/docker.service.d/override.conf

Usually (and also in this case for testing) you can edit the override file (which adds your config to the existing config) with sudo systemctl edit docker.service. Unfortunately, the override file is apparently deleted every time the system starts, which is why I persisted it via the startup-script.

Before this approach a tried what Wojtek_B suggested (sorry, my reputation is too low to comment directly) but that did not work. The reason is, that the docker daemon gets the termination signal before the shutdown script is processed. As involving docker within the shutdown-script of the "Container Optimized OS" fails (or is at least risky) it could be regarded as a bug.

0
Micah Smith On

Expanding on the answer of @Michael Korn, which did work for me

I'd suggest the following full startup script

#!/bin/bash

# ensure SIGTERM is sent to ALL docker containers if the instance is killed
mkdir -p /etc/systemd/system/docker.service.d
cat <<EOF >/etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStop=/bin/sh -c 'docker ps -q | xargs docker stop --signal TERM --time 60'
EOF
systemctl daemon-reload
systemctl restart docker

docker systemd unit has started before the startup script is written, so first systemd needs to re-read the configuration for docker unit (daemon-reload), then docker unit needs to be restarted.

example command if using "Containers on Compute Engine" via create-with-container (untested in this exact minimal form, sorry)

gcloud compute instances create-with-container test \
  --container-image=gcr.io/your-image:latest \
  --create-disk=auto-delete=yes,device-name=test,image-project=cos-cloud,image-family=cos-101-lts,mode=rw,size=10GB,type=pd-balanced \
  --metadata-from-file=startup-script=path/to/startup-script.sh