What I wanna accomplish
I'm trying to connect an external HTTPS (L7) load balancer with an NGINX Ingress exposed as a zonal Network Endpoint Group (NEG). My Kubernetes cluster (in GKE) contains a couple of web application deployments that I've exposed as a ClusterIP service.
I know that the NGINX Ingress object can be directly exposed as a TCP load balancer. But, this is not what I want. Instead in my architecture, I want to load balance the HTTPS requests with an external HTTPS load balancer. I want this external load balancer to provide SSL/TLS termination and forward HTTP requests to my Ingress resource.
The ideal architecture would look like this:
HTTPS requests --> external HTTPS load balancer --> HTTP request --> NGINX Ingress zonal NEG --> appropriate web application
I'd like to add the zonal NEGs from the NGINX Ingress as the backends for the HTTPS load balancer. This is where things fall apart.
What I've done
NGINX Ingress config
I'm using the default NGINX Ingress config from the official kubernetes/ingress-nginx project. Specifically, this YAML file https://github.com/kubernetes/ingress-nginx/blob/master/deploy/static/provider/cloud/deploy.yaml. Note that, I've changed the NGINX-controller Service section as follows:
Added NEG annotation
Changed the Service type from
LoadBalancer
toClusterIP
.
# Source: ingress-nginx/templates/controller-service.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
# added NEG annotation
cloud.google.com/neg: '{"exposed_ports": {"80":{"name": "NGINX_NEG"}}}'
labels:
helm.sh/chart: ingress-nginx-3.30.0
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.46.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
type: ClusterIP
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
---
NGINX Ingress routing
I've tested the path based routing rules for the NGINX Ingress to my web applications independently. This works when the NGINX Ingress is configured with a TCP Load Balancer. I've set up my application Deployment and Service configs the usual way.
External HTTPS Load Balancer
I created an external HTTPS load balancer with the following settings:
- Backend: added the zonal NEGs named
NGINX_NEG
as the backends. The backend is configured to accept HTTP requests on port 80. I configured the health check on the serving port via the TCP protocol. I added the firewall rules to allow incoming traffic from130.211.0.0/22
and35.191.0.0/16
as mentioned here https://cloud.google.com/kubernetes-engine/docs/how-to/standalone-neg#traffic_does_not_reach_the_endpoints
What's not working
Soon after the external load balancer is set up, I can see that GCP creates a new endpoint under one of the zonal NEGs. But this shows as "Unhealthy". Requests to the external HTTPS load balancer return a 502 error.
I'm not sure where to start debugging this configuration in GCP logging. I've enabled logging for the health check but nothing shows up in the logs.
I configured the health check on the
/healthz
path of the NGINX Ingress controller. That didn't seem to work either.
Any tips on how to get this to work will be much appreciated. Thanks!
Edit 1: As requested, I ran the kubectl get svcneg -o yaml --namespace=<namespace>
, here's the output
apiVersion: networking.gke.io/v1beta1
kind: ServiceNetworkEndpointGroup
metadata:
creationTimestamp: "2021-05-07T19:04:01Z"
finalizers:
- networking.gke.io/neg-finalizer
generation: 418
labels:
networking.gke.io/managed-by: neg-controller
networking.gke.io/service-name: ingress-nginx-controller
networking.gke.io/service-port: "80"
name: NGINX_NEG
namespace: ingress-nginx
ownerReferences:
- apiVersion: v1
blockOwnerDeletion: false
controller: true
kind: Service
name: ingress-nginx-controller
uid: <unique ID>
resourceVersion: "2922506"
selfLink: /apis/networking.gke.io/v1beta1/namespaces/ingress-nginx/servicenetworkendpointgroups/NGINX_NEG
uid: <unique ID>
spec: {}
status:
conditions:
- lastTransitionTime: "2021-05-07T19:04:08Z"
message: ""
reason: NegInitializationSuccessful
status: "True"
type: Initialized
- lastTransitionTime: "2021-05-07T19:04:10Z"
message: ""
reason: NegSyncSuccessful
status: "True"
type: Synced
lastSyncTime: "2021-05-10T15:02:06Z"
networkEndpointGroups:
- id: <id1>
networkEndpointType: GCE_VM_IP_PORT
selfLink: https://www.googleapis.com/compute/v1/projects/<project>/zones/us-central1-a/networkEndpointGroups/NGINX_NEG
- id: <id2>
networkEndpointType: GCE_VM_IP_PORT
selfLink: https://www.googleapis.com/compute/v1/projects/<project>/zones/us-central1-b/networkEndpointGroups/NGINX_NEG
- id: <id3>
networkEndpointType: GCE_VM_IP_PORT
selfLink: https://www.googleapis.com/compute/v1/projects/<project>/zones/us-central1-f/networkEndpointGroups/NGINX_NEG
As per my understanding, your issue is - “when an external load balancer is set up, GCP creates a new endpoint under one of the zonal NEGs and it shows "Unhealthy" and requests to the external HTTPS load balancer which return a 502 error”.
Essentially, the Service's annotation, cloud.google.com/neg: '{"ingress": true}', enables container-native load balancing. After creating the Ingress, an HTTP(S) load balancer is created in the project, and NEGs are created in each zone in which the cluster runs. The endpoints in the NEG and the endpoints of the Service are kept in sync. Refer to the link [1].
New endpoints generally become reachable after attaching them to the load balancer, provided that they respond to health checks. You might encounter 502 errors or rejected connections if traffic cannot reach the endpoints.
One of your endpoints in zonal NEG is showing unhealthy so please confirm the status of other endpoints and how many endpoints are spread across the zones in the backend. If all backends are unhealthy, then your firewall, Ingress, or service might be misconfigured.
You can run following command to check if your endpoints are healthy or not and refer link [2] for the same - gcloud compute network-endpoint-groups list-network-endpoints NAME \ --zone=ZONE
To troubleshoot traffic that is not reaching the endpoints, verify that health check firewall rules allow incoming TCP traffic to your endpoints in the 130.211.0.0/22 and 35.191.0.0/16 ranges. But as you mentioned you have configured this rule correctly. Please refer link [3] for health check Configuration.
Run the Curl command against your LB IP to check for responses -
Curl [LB IP]
[1] https://cloud.google.com/kubernetes-engine/docs/concepts/ingress-xlb
[2] https://cloud.google.com/load-balancing/docs/negs/zonal-neg-concepts#troubleshooting
[3] https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#health_checks