I’m trying to make use of k8s daemonset's rolling update to do the automatic rolling update when daemonset's spec.template field is changed. I intentionally put an invalid image for pods so that pods couldn't be started correctly. I suppose the rolling update could be stopped when the number of unavailable pods more than the number defined in maxUnavailable. Unfortunately, it doesn't happen, and the pods are kept updated until all pods enter CrashLoopBackOff.
I run my test in 3 nodes env: kubectl get node -A
NAME STATUS ROLES AGE VERSION wdc-rdops-vm05-dhcp-74-190 Ready <none> 65d v1.18.0 wdc-rdops-vm05-dhcp-86-61 Ready master 65d v1.18.0 wdc-rdops-vm05-dhcp-93-214 Ready <none> 65d v1.18.0
I found a similar thread in: How to automatically stop rolling update when CrashLoopBackOff? but here is for daemonSet not for deployment.
As suggested in the thread, I've added
spec:
minReadySeconds: 120
in order to make sure containers are running well to set pod available or unavailable status.
However, the final 3 pods are crashed
nsx-system nsx-node-agent-9cl2v 0/3 CrashLoopBackOff 3 23s
nsx-system nsx-node-agent-c95wb 3/3 Running 3 11m
nsx-system nsx-node-agent-p58vs 3/3 Running 3 11m
The first deployed pod was not healthy for more than 120 seconds, it should be unavailable. However, the update was not stopped as expected, it kept going until all pods replcaed but crashed:
nsx-system nsx-node-agent-9cl2v 0/3 CrashLoopBackOff 45 15m
nsx-system nsx-node-agent-6mlmq 0/3 CrashLoopBackOff 48 2m46s
nsx-system nsx-node-agent-9fzcc 0/3 CrashLoopBackOff 57 2m59s
The complete daemonset's spec YAML: kubectl get ds -n nsx-system nsx-node-agent -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
creationTimestamp: "2021-02-21T11:28:03Z"
generation: 101
labels:
component: nsx-node-agent
tier: nsx-networking
version: v1
managedFields:
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:deprecated.daemonset.template.generation: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:labels:
.: {}
f:component: {}
f:tier: {}
f:version: {}
f:spec:
f:revisionHistoryLimit: {}
f:selector:
f:matchLabels:
.: {}
f:component: {}
f:tier: {}
f:version: {}
f:template:
f:metadata:
f:annotations:
.: {}
f:container.apparmor.security.beta.kubernetes.io/nsx-node-agent: {}
f:labels:
.: {}
f:component: {}
f:tier: {}
f:version: {}
f:spec:
f:containers:
k:{"name":"nsx-kube-proxy"}:
.: {}
f:command: {}
f:env:
.: {}
k:{"name":"CONTAINER_NAME"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"POD_NAME"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:fieldRef:
.: {}
f:apiVersion: {}
f:fieldPath: {}
f:imagePullPolicy: {}
f:livenessProbe:
.: {}
f:exec:
.: {}
f:command: {}
f:failureThreshold: {}
f:initialDelaySeconds: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:name: {}
f:resources: {}
f:securityContext:
.: {}
f:capabilities:
.: {}
f:add: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/etc/nsx-ujo"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
k:{"mountPath":"/var/log/nsx-ujo"}:
.: {}
f:mountPath: {}
k:{"mountPath":"/var/run/openvswitch"}:
.: {}
f:mountPath: {}
f:name: {}
k:{"name":"nsx-node-agent"}:
.: {}
f:command: {}
f:env:
.: {}
k:{"name":"CONTAINER_NAME"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"POD_NAME"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:fieldRef:
.: {}
f:apiVersion: {}
f:fieldPath: {}
f:imagePullPolicy: {}
f:livenessProbe:
.: {}
f:exec: {}
f:failureThreshold: {}
f:initialDelaySeconds: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:name: {}
f:resources: {}
f:securityContext:
.: {}
f:capabilities:
.: {}
f:add: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/etc/nsx-ujo"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
k:{"mountPath":"/host/etc/os-release"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
k:{"mountPath":"/host/proc"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
k:{"mountPath":"/host/var/run/netns"}:
.: {}
f:mountPath: {}
f:mountPropagation: {}
f:name: {}
k:{"mountPath":"/var/lib/kubelet/device-plugins/"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
k:{"mountPath":"/var/log/nsx-ujo"}:
.: {}
f:mountPath: {}
k:{"mountPath":"/var/run/nsx-ujo"}:
.: {}
f:mountPath: {}
f:name: {}
k:{"mountPath":"/var/run/openvswitch"}:
.: {}
f:mountPath: {}
f:name: {}
k:{"name":"nsx-ovs"}:
.: {}
f:command: {}
f:imagePullPolicy: {}
f:livenessProbe:
.: {}
f:exec:
.: {}
f:command: {}
f:failureThreshold: {}
f:initialDelaySeconds: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:name: {}
f:resources: {}
f:securityContext:
.: {}
f:capabilities:
.: {}
f:add: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/etc/nsx-ujo"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
k:{"mountPath":"/etc/openvswitch"}:
.: {}
f:mountPath: {}
f:name: {}
f:subPath: {}
k:{"mountPath":"/host/etc/openvswitch"}:
.: {}
f:mountPath: {}
f:name: {}
k:{"mountPath":"/host/etc/os-release"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
k:{"mountPath":"/lib/modules"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
k:{"mountPath":"/sys"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
k:{"mountPath":"/var/log/nsx-ujo"}:
.: {}
f:mountPath: {}
k:{"mountPath":"/var/log/openvswitch"}:
.: {}
f:mountPath: {}
f:name: {}
f:subPath: {}
k:{"mountPath":"/var/run/openvswitch"}:
.: {}
f:mountPath: {}
f:name: {}
f:dnsPolicy: {}
f:hostNetwork: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:serviceAccount: {}
f:serviceAccountName: {}
f:terminationGracePeriodSeconds: {}
f:tolerations: {}
f:volumes:
.: {}
k:{"name":"device-plugins"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"host-modules"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"host-original-ovs-db"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"host-os-release"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"host-sys"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"host-var-log-ujo"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"netns"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"openvswitch"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"proc"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
k:{"name":"projected-volume"}:
.: {}
f:name: {}
f:projected:
.: {}
f:defaultMode: {}
f:sources: {}
k:{"name":"var-run-ujo"}:
.: {}
f:hostPath:
.: {}
f:path: {}
f:type: {}
f:name: {}
f:updateStrategy:
f:rollingUpdate:
.: {}
f:maxUnavailable: {}
f:type: {}
manager: kubectl
operation: Update
time: "2021-04-19T08:07:54Z"
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:minReadySeconds: {}
f:template:
f:spec:
f:containers:
k:{"name":"nsx-kube-proxy"}:
f:image: {}
f:volumeMounts:
k:{"mountPath":"/var/log/nsx-ujo"}:
f:name: {}
k:{"name":"nsx-node-agent"}:
f:image: {}
f:livenessProbe:
f:exec:
f:command: {}
f:volumeMounts:
k:{"mountPath":"/var/log/nsx-ujo"}:
f:name: {}
k:{"name":"nsx-ovs"}:
f:image: {}
f:volumeMounts:
k:{"mountPath":"/var/log/nsx-ujo"}:
f:name: {}
f:status:
f:desiredNumberScheduled: {}
manager: nsx-ncp-operator
operation: Update
time: "2021-04-27T10:01:23Z"
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:currentNumberScheduled: {}
f:numberReady: {}
f:numberUnavailable: {}
f:observedGeneration: {}
f:updatedNumberScheduled: {}
manager: kube-controller-manager
operation: Update
time: "2021-04-27T10:15:28Z"
name: nsx-node-agent
namespace: nsx-system
resourceVersion: "14594084"
selfLink: /apis/apps/v1/namespaces/nsx-system/daemonsets/nsx-node-agent
uid: e3dd0951-1b31-4095-8c27-56ec9780d94e
spec:
minReadySeconds: 120
revisionHistoryLimit: 10
selector:
matchLabels:
component: nsx-node-agent
tier: nsx-networking
version: v1
template:
metadata:
annotations:
container.apparmor.security.beta.kubernetes.io/nsx-node-agent: localhost/node-agent-apparmor
creationTimestamp: null
labels:
component: nsx-node-agent
tier: nsx-networking
version: v1
spec:
containers:
- command:
- start_node_agent
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: CONTAINER_NAME
value: nsx-node-agent
image: registry.access.redhat.com/ubi8/ubi:latest
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -c
- check_pod_liveness nsx-node-agent 5
failureThreshold: 5
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: nsx-node-agent
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_ADMIN
- SYS_PTRACE
- DAC_READ_SEARCH
- NET_RAW
- AUDIT_WRITE
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/nsx-ujo
name: projected-volume
readOnly: true
- mountPath: /var/run/openvswitch
name: openvswitch
- mountPath: /var/run/nsx-ujo
name: var-run-ujo
- mountPath: /host/var/run/netns
mountPropagation: HostToContainer
name: netns
- mountPath: /host/proc
name: proc
readOnly: true
- mountPath: /var/lib/kubelet/device-plugins/
name: device-plugins
readOnly: true
- mountPath: /host/etc/os-release
name: host-os-release
readOnly: true
- mountPath: /var/log/nsx-ujo
name: host-var-log-ujo
- command:
- start_kube_proxy
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: CONTAINER_NAME
value: nsx-kube-proxy
image: registry.access.redhat.com/ubi8/ubi:latest
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -c
- check_pod_liveness nsx-kube-proxy 5
failureThreshold: 5
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: nsx-kube-proxy
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_ADMIN
- SYS_PTRACE
- DAC_READ_SEARCH
- NET_RAW
- AUDIT_WRITE
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/nsx-ujo
name: projected-volume
readOnly: true
- mountPath: /var/run/openvswitch
name: openvswitch
- mountPath: /var/log/nsx-ujo
name: host-var-log-ujo
- command:
- start_ovs
image: registry.access.redhat.com/ubi8/ubi:latest
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -c
- check_pod_liveness nsx-ovs 10
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
name: nsx-ovs
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_ADMIN
- SYS_NICE
- SYS_MODULE
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/nsx-ujo
name: projected-volume
readOnly: true
- mountPath: /etc/openvswitch
name: var-run-ujo
subPath: openvswitch-db
- mountPath: /var/run/openvswitch
name: openvswitch
- mountPath: /sys
name: host-sys
readOnly: true
- mountPath: /host/etc/openvswitch
name: host-original-ovs-db
- mountPath: /lib/modules
name: host-modules
readOnly: true
- mountPath: /host/etc/os-release
name: host-os-release
readOnly: true
- mountPath: /var/log/openvswitch
name: host-var-log-ujo
subPath: openvswitch
- mountPath: /var/log/nsx-ujo
name: host-var-log-ujo
dnsPolicy: ClusterFirst
hostNetwork: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: nsx-node-agent-svc-account
serviceAccountName: nsx-node-agent-svc-account
terminationGracePeriodSeconds: 60
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node.kubernetes.io/not-ready
- effect: NoSchedule
key: node.kubernetes.io/unreachable
volumes:
- name: projected-volume
projected:
defaultMode: 420
sources:
- configMap:
items:
- key: ncp.ini
path: ncp.ini
name: nsx-node-agent-config
- configMap:
items:
- key: version
path: VERSION
name: nsx-ncp-version-config
- hostPath:
path: /var/run/openvswitch
type: ""
name: openvswitch
- hostPath:
path: /var/run/nsx-ujo
type: ""
name: var-run-ujo
- hostPath:
path: /var/run/netns
type: ""
name: netns
- hostPath:
path: /proc
type: ""
name: proc
- hostPath:
path: /var/lib/kubelet/device-plugins/
type: ""
name: device-plugins
- hostPath:
path: /var/log/nsx-ujo
type: DirectoryOrCreate
name: host-var-log-ujo
- hostPath:
path: /sys
type: ""
name: host-sys
- hostPath:
path: /lib/modules
type: ""
name: host-modules
- hostPath:
path: /etc/openvswitch
type: ""
name: host-original-ovs-db
- hostPath:
path: /etc/os-release
type: ""
name: host-os-release
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberMisscheduled: 0
numberReady: 0
numberUnavailable: 3
observedGeneration: 101
updatedNumberScheduled: 3
The ds output as below: kc get ds -n nsx-system -w
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
nsx-node-agent 3 3 0 3 0 <none> 64d
I don't understand why k8s didn't stop when the number of unavailable pods more than maxUnavailable: 1.
In addition: we see pods's age is far more than minReadySeconds
Seemly, k8's rolling update strategy doesn't follow the defined spec? It shouldn't allow this situation to happen when rolling update.
I don't see readiness probes defined in your manifests. Without readiness probes, Kubernetes will consider a pod to be "ready" as soon as the process is running, and will proceed with terminating other pods during a RollingUpdate.
A failing readiness probe on one pod with
maxUnavailableset to 1 should stop the update - but if there is no such probe, there's nothing informing the cluster that pod is not actually ready to accept traffic.