we have a EKS Cluster 1.21 (we are upgrading it soon to 1.24) where the pod seems to be getting restarted in regular intervals, I checked the logs and memory usage but don't see anything that could point to the restart reason.
I see this in the events of the pod
LAST SEEN TYPE REASON OBJECT MESSAGE
52m Warning Unhealthy pod/backend-6cc49d746-ztnvv Readiness probe failed: Get "http://192.168.29.43:80/users/sign_in": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
52m Warning Unhealthy pod/backend-6cc49d746-ztnvv Readiness probe failed: Get "http://192.168.29.43:3000/users/sign_in": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
52m Warning Unhealthy pod/backend-6cc49d746-ztnvv Liveness probe failed: Get "http://192.168.29.43:3000/users/sign_in": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
52m Warning Unhealthy pod/backend-6cc49d746-ztnvv Liveness probe failed: Get "http://192.168.29.43:80/users/sign_in": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
my readiness and liveness probes simply check by loading the sign in page. this has been working for a long time but suddenly we are noticing the restart count
livenessProbe:
failureThreshold: 3
httpGet:
path: /users/sign_in
port: 80
scheme: HTTP
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 5
name: nginx
ports:
- containerPort: 80
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /users/sign_in
port: 80
scheme: HTTP
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 5
i see this when i describe the pod when its in restart mode
Containers:
1:
Container ID: docker://cf5b2086db6d55f
Image: 60
Image ID: 1
Port: 3000/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 17 Sep 2023 17:01:21 +0200
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Sun, 17 Sep 2023 16:01:21 +0200
Finished: Sun, 17 Sep 2023 17:01:18 +0200
Ready: True
Restart Count: 3
Looks like the exit code 137 is when container is using more memory but i have not specified any memory limit, whats the default that it's using? could the memory be a issue here which is causing the restart?
I am not sure in what direction to investigate to resolve the issue, any help would be great.
As the error says that "Client.Timeout exceeded while awaiting headers", which means that the probe was considered to be failed by the Kubernetes as it didn't responded in specified time.
All needs to be done is to increase your timeoutSeconds to 10s for both livenessProbe and readinessProbe.
timeoutSeconds: This parameter is part of the configuration for both liveness and readiness probes. It specifies the number of seconds after which the probe times out. The default value is 1 second. If a probe doesn’t respond within the specified timeoutSeconds, Kubernetes considers the probe to have failed.