I'm running a WebService backend application in Kubernetes (GKE). It is used only by our frontend Web app. Typically there are sequences of tens of requests coming from the same user (ClientIP). My app is set up to run at least 2 instances ("minReplicas: 2").
The problem:
From logs I can see situations when one pod is overloaded (receiving many requests) while the other is idle. Both pods being in Ready
state.
My attempt to fix it: I tried to add a custom Readiness health check that returns "Unhealthy" status when there is too many open connections. But even after the health check returned "Unhealthy", load balancer sends further requests to the same pod while the second (healthy) pod is idle.
Here is an excerpt from service.yaml:
kind: Service
metadata:
annotations:
networking.gke.io/load-balancer-type: "Internal"
spec:
type: LoadBalancer
ports:
- protocol: TCP
port: 80
targetPort: 8080
sessionAffinity
is not specified so I expect it is "None"
My questions: What am I doing wrong? Has the Readiness health check any effect on load balancer? How can I control requests distribution?
Additional information:
Cluster creation:
gcloud container --project %PROJECT% clusters create %CLUSTER%
--zone "us-east1-b" --release-channel "stable" --machine-type "n1-standard-2"
--disk-type "pd-ssd" --disk-size "20" --metadata disable-legacy-endpoints=true
--scopes "storage-rw" --num-nodes "1" --enable-stackdriver-kubernetes
--enable-ip-alias --network "xxx" --subnetwork "xxx"
--cluster-secondary-range-name "xxx" --services-secondary-range-name "xxx"
--no-enable-master-authorized-networks
Node Pool:
gcloud container node-pools create XXX --project %PROJECT% --zone="us-east1-b"
--cluster=%CLUSTER% --machine-type=c2-standard-4 --max-pods-per-node=16
--num-nodes=1 --disk-type="pd-ssd" --disk-size="10" --scopes="storage-full"
--enable-autoscaling --min-nodes=1 --max-nodes=30
Service:
apiVersion: v1
kind: Service
metadata:
name: XXX
annotations:
networking.gke.io/load-balancer-type: "Internal"
labels:
app: XXX
version: v0.1
spec:
selector:
app: XXX
version: v0.1
type: LoadBalancer
ports:
- protocol: TCP
port: 80
targetPort: 8080
HPA:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: XXX
spec:
scaleTargetRef:
apiVersion: "apps/v1"
kind: Deployment
name: XXX
minReplicas: 2
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 40
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: XXX
labels:
app: XXX
version: v0.1
spec:
replicas: 1
selector:
matchLabels:
app: XXX
version: v0.1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: XXX
version: v0.1
spec:
containers:
- image: XXX
name: XXX
imagePullPolicy: Always
resources:
requests:
memory: "10Gi"
cpu: "3200m"
limits:
memory: "10Gi"
cpu: "3600m"
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 3
periodSeconds: 8
failureThreshold: 3
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 120
periodSeconds: 30
nodeSelector:
cloud.google.com/gke-nodepool: XXX
Posting this community wiki answer to extend on the comment that I made on the reproduction steps.
The reproduction steps I've followed:
$ kubectl get nodes
$ kubectl get pods -o wide
The testing was done with a
VM
in the same zone that has access to the Internal Load Balancer.The tool/command used:
$ ab -n 100000 http://INTERNAL_LB_IP_ADDRESS/
The logs showed the requests per pod accordingly:
With the internal load balancer, the traffic should be split evenly between the backends (by default it uses the
CONNECTION
balancing mode).There could be many possible reasons on why the traffic is not evenly distributed.
replica
of an app is not inReady
state.Node
is inunhealthy
state.It could be useful to check if the same situation happens in different scenarios (different cluster, different image, etc.).
It could also be a good idea to check the details about the
Service
and thePods
in theCloud Console
:Cloud Console
(Web UI) ->Kubernetes Engine
->Services & Ingress
->SERVICE_NAME
->Serving pods
Additional resources: