Restarting Docker daemon on host node from within Kubernetes pod

Question

Restarting Docker daemon on host node from within Kubernetes pod

3.2k Views Asked by Eric Meadows At 27 July 2025 at 16:20

Goal: Restart Docker daemon on GKE

Issue: Cannot connect to bus

Background While on Google Kubernetes Engine (GKE), I am attempting to restart the host node's Docker daemon in order to enable the Nvidia GPU Telemetry for Kubernetes on nodes that have a GPU. I have correctly isolated just the GPU nodes properly, and I am able to run every command on the host node by having a DaemonSet run an initContainer following the Automatically bootstrapping Kubernetes Engine nodes with daemonSets guide.

During runtime, however, the following pod does not allow me to connect to the Docker daemon:

apiVersion: v1
kind: Pod
metadata:
  name: debug
  namespace: gpu-monitoring
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cloud.google.com/gke-accelerator
            operator: Exists
  containers:
  - command:
    - sleep
    - "86400"
    env:
    - name: ROOT_MOUNT_DIR
      value: /root
    image: docker.io/ubuntu:18.04
    imagePullPolicy: IfNotPresent
    name: node-initializer
    securityContext:
      privileged: true
    volumeMounts:
    - mountPath: /root
      name: root
    - mountPath: /scripts
      name: entrypoint
    - mountPath: /run
      name: run
  volumes:
  - hostPath:
      path: /
      type: ""
    name: root
  - configMap:
      defaultMode: 484
      name: nvidia-container-toolkit-installer-entrypoint
    name: entrypoint
  - hostPath:
      path: /run
      type: ""
    name: run

The user is 0, while the users present in /run/user are 1003, and 1002.

In order to verify connectivity and interactions with the root Kubernetes (k8s) node, the following is run:

root@debug:/# chroot "${ROOT_MOUNT_DIR}" ps aux

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0 226124  9816 ?        Ss   Oct13   0:27 /sbin/init

The Issues

Both images

When attempting to interact with the underlying Kubernetes (k8s) node to restart the Docker daemon, I get the following:

root@debug:/# ls /run/dbus

system_bus_socket

root@debug:/# ROOT_MOUNT_DIR="${ROOT_MOUNT_DIR:-/root}"
root@debug:/# chroot "${ROOT_MOUNT_DIR}" systemctl status docker

Failed to connect to bus: No data available

When attempting to start dbus on the host node:

root@debug:/# export XDG_RUNTIME_DIR=/run/user/`id -u`
root@debug:/# export DBUS_SESSION_BUS_ADDRESS="unix:path=${XDG_RUNTIME_DIR}/bus"
root@debug:/# chroot "${ROOT_MOUNT_DIR}" /etc/init.d/dbus start

Failed to connect to bus: No data available

Image: solita/ubuntu-systemd

When trying to run commands using the same k8s pod config, except inside the solita/ubuntu-systemd image, the following are the results:

root@debug:/# /etc/init.d/dbus start
[....] Starting dbus (via systemctl): dbus.serviceRunning in chroot, ignoring request: start
. ok

Configuration Variations Attempted I have tried to change the following, in pretty much every combination, to no avail:

Image to docker.io/solita/ubuntu-systemd:18.04
Add shareProcessNamespace: true
Add the following mounts: /dev, /proc, /sys
Restrict /run to /run/dbus, and /run/systemd

Original Q&A

There are 1 best solutions below

**Eric Meadows** · Answer 1

So the answer is a weird workaround that was not fully expected. In order to restart the Docker daemon, first punch a firewall hole for pods to connect to the host node. Next, use gcloud compute ssh, and ssh into the node and restart via a remote ssh command:

apt-get update
apt-get install -y \
  apt-transport-https \
  curl \
  gnupg \
  lsb-release \
  ssh

export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"
echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
apt-get update
apt-get install -y google-cloud-sdk

CLUSTER_NAME="$(curl -sS http://metadata/computeMetadata/v1/instance/attributes/cluster-name -H "Metadata-Flavor: Google")"
NODE_NAME="$(curl -sS http://metadata.google.internal/computeMetadata/v1/instance/name -H 'Metadata-Flavor: Google')"
FULL_ZONE="$(curl -sS http://metadata.google.internal/computeMetadata/v1/instance/zone -H 'Metadata-Flavor: Google' | awk -F  "/" '{print $4}')"
MAIN_ZONE=$(echo $FULL_ZONE | sed 's/\(.*\)-.*/\1/')

gcloud compute ssh \
  --internal-ip $NODE_NAME \
  --zone=$FULL_ZONE \
  -- "sudo systemctl restart docker"

Restarting Docker daemon on host node from within Kubernetes pod

There are 1 best solutions below

Related Questions in DOCKER

Related Questions in KUBERNETES

Related Questions in GOOGLE-KUBERNETES-ENGINE

Related Questions in DBUS

Related Questions in SYSTEMCTL

Trending Questions

Popular # Hahtags

Popular Questions