Hi you all I'm ending the installation of a brand new cluster and I'm facing a strange issue.
I'm deploying the ingress-nginx both via manifest and helm chart but they give me the same result
kubectl get po
nginx-ingress-dx6bg 0/1 Running 3 (26s ago) 3m44s 10.244.4.118 node-2 <none> <none>
nginx-ingress-gqkhz 0/1 Running 3 (29s ago) 3m47s 10.244.3.16 node-1 <none> <none>
nginx-ingress-dx6bg 0/1 Error 3 (86s ago) 4m44s 10.244.4.118 node-2 <none> <none>
nginx-ingress-gqkhz 0/1 Error 3 (89s ago) 4m47s 10.244.3.16 node-1 <none> <none>
nginx-ingress-dx6bg 0/1 CrashLoopBackOff 3 (12s ago) 4m56s 10.244.4.118 node-2 <none> <none>
nginx-ingress-gqkhz 0/1 CrashLoopBackOff 3 (13s ago) 4m59s 10.244.3.16 node-1 <none> <none>
nginx-ingress-gqkhz 0/1 Running 4 (44s ago) 5m30s 10.244.3.16 node-1 <none> <none>
nginx-ingress-dx6bg 0/1 Running 4 (51s ago) 5m35s 10.244.4.118 node-2 <none> <none>
nginx-ingress-b9fcfbb59-hwjc8 0/1 Running 6 (2m49s ago) 12m 10.244.4.116 node-2 <none> <none>
and describing the pod the issue is in the liveness probes
kd po -n nginx-ingress nginx-ingress-b9fcfbb59-hwjc8
Name: nginx-ingress-b9fcfbb59-hwjc8
Namespace: nginx-ingress
Priority: 0
Service Account: nginx-ingress
Node: node-2/192.168.17.15
Start Time: Thu, 08 Feb 2024 17:09:37 +0100
Labels: app=nginx-ingress
app.kubernetes.io/name=nginx-ingress
app.kubernetes.io/version=3.4.2
app.nginx.org/version=1.25.3
pod-template-hash=b9fcfbb59
Annotations: <none>
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.244.4.116
IPs:
IP: 10.244.4.116
Controlled By: ReplicaSet/nginx-ingress-b9fcfbb59
Containers:
nginx-ingress:
Container ID: containerd://57299408237d9d8b1b7be67ac12d6999640ff2249305c8d289a78a58fe6b38c9
Image: nginx/nginx-ingress:3.4.2
Image ID: docker.io/nginx/nginx-ingress@sha256:4b97f1d3466c804d51abbdeb84f2c7c3ea00d6a937a320d62a4cf9d6b447d6ad
Ports: 80/TCP, 443/TCP, 8081/TCP, 9113/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
-nginx-configmaps=$(POD_NAMESPACE)/nginx-config
State: Running
Started: Thu, 08 Feb 2024 17:17:51 +0100
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Thu, 08 Feb 2024 17:15:30 +0100
Finished: Thu, 08 Feb 2024 17:16:30 +0100
Ready: False
Restart Count: 5
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:readiness-port/nginx-ready delay=0s timeout=1s period=1s #success=1 #failure=3
Environment:
POD_NAMESPACE: nginx-ingress (v1:metadata.namespace)
POD_NAME: nginx-ingress-b9fcfbb59-hwjc8 (v1:metadata.name)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vlfd8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-vlfd8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m57s default-scheduler Successfully assigned nginx-ingress/nginx-ingress-b9fcfbb59-hwjc8 to node-2
Normal Pulling 8m57s kubelet Pulling image "nginx/nginx-ingress:3.4.2"
Normal Pulled 8m35s kubelet Successfully pulled image "nginx/nginx-ingress:3.4.2" in 21.588s (21.589s including waiting)
Normal Created 8m35s kubelet Created container nginx-ingress
Normal Started 8m35s kubelet Started container nginx-ingress
Warning Unhealthy 3m56s (x250 over 8m34s) kubelet Readiness probe failed: Get "http://10.244.4.116:8081/nginx-ready": dial tcp 10.244.4.116:8081: connect: connection refused
according to the known issue to nginx corp i instructed helm to rise the timeouts but without any positive result.
helm install nginx-ingress-controller nginx-stable/nginx-ingress --set rbac.create=true --set controller."nodeSelector\.kubernetes\.io/hostname"=node-2 --set nginxReloadTimeout=20000
Do you have any suggestion? possibly without resetting the whole cluster?
on a different cluster it worked correctly.
Looking at it purely from a deployment perspective - First rule out resource issue. Do a describe on the stopped nginx-ingress-gqkhz or nginx-ingress-dx6bg replicas and check the error. Also suggest scaling it down to 1 or 2 replicas and see if the container starts. Readiness probe failing doesnt indicate much.
Also, on the container it shows as running, read the logs (kubectl logs podname containername). That might give you some info.
Although i see CrashLoopBackOff on some replicas, I have to rule out any network issue since some of the replicas have pulled the image.