HAProxy Ingress Controller Service Changed IP on GCP

939 Views Asked by At

I am using HAProxy as the ingress-controller in my GKE clusters. And exposing HAProxy service as LoadBalancer service(Internal).

Recently, I experienced an issue, where the HA-Proxy service changed its EXTERNAL-IP, and traffic stopped routing to HAProxy. This issue occurred multiple times on different days(now it has stopped). I had to manually add that new External-IP to the frontend of that Loadbalancer to allow traffic to HAProxy.
There were two pods running for HAProxy, and both had been running for days, and there was nothing in their logs. I assume it was something related to Service or GCP LB and not HAProxy itself.
I am afraid that I don't have any logs related to that.

I still don't know, what caused the service IP to change. As there were no recent changes, and the cluster and all services were running for many days properly, and suddenly this occurred.

Has anyone faced a similar issue earlier? Or what can I do to avoid such issue in future?
What could have caused the IP to change?

This is how my service is configured:

---
apiVersion: v1
kind: Service
metadata:
  labels:
    run: haproxy-ingress
  name: haproxy-ingress
  namespace: haproxy-controller
  annotations:
    cloud.google.com/load-balancer-type: "Internal"
    networking.gke.io/internal-load-balancer-allow-global-access: "true"
    cloud.google.com/network-tier: "Premium"
spec:
  selector:
    run: haproxy-ingress
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  - name: stat
    port: 1024
    protocol: TCP
    targetPort: 1024

Found some logs:

Warning SyncLoadBalancerFailed 30m (x3570 over 13d) service-controller Error syncing load balancer: failed to ensure load balancer: googleapi: Error 409: IP_IN_USE_BY_ANOTHER_RESOURCE - IP '10.17.129.17' is already being used by another resource.
Normal EnsuringLoadBalancer 3m33s (x3576 over 13d) service-controller Ensuring load balancer
2

There are 2 best solutions below

0
On BEST ANSWER

The Short answer is: External IP for the service are ephemeral.
Because HA-Proxy controller pods are recreated the HA-Proxy service is created with an ephemeral IP.

To avoid this issue, I would recommend using a static IP that you can reference in the loadBalancerIP field.

This can be done by following steps:

  • Reserve a static IP. (link)
  • Use this IP, to create a service (link)

Example YAML:

apiVersion: v1
kind: Service
metadata:
  name: helloweb
  labels:
    app: hello
spec:
  selector:
    app: hello
    tier: web
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer
  loadBalancerIP: "YOUR.IP.ADDRESS.HERE"
1
On

Unfortunately without logs it's hard to say anything for sure. You should check the audit logs that GKE ships to Cloud Logging as that might give you some idea of what happened. One option is the GCP "oops"'d the GLB and GKE recreated it, thus giving it a new IP. I've never heard of that happening with LBs though (it happens pretty often with nodes, but not LBs). A more common case would be you ran some kubectl command that inadvertently removed the Service object and then it was recreated by some management layer you have set up (Argo, Flux, Helm Operator, whatever) but delete+recreate again means it's a new LB with a new IP. The latter case should be visible in the audit logs so check those out for sure.