Configuring Istio, Kubernetes and MetalLB to use a Istio LoadBalancer

4.2k Views Asked by At

I’m struggling with the last step of a configuration using MetalLB, Kubernetes, Istio on a bare-metal instance, and that is to have a web page returned from a service to the outside world via an Istio VirtualService route. I’ve just updated the instance to

  • MetalLB (version 0.7.3)
  • Kubernetes (version 1.12.2)
  • Istio (version 1.0.3)

I’ll start with what does work.

All complementary services have been deployed and most are working:

  1. Kubernetes Dashboard on http://localhost:8001
  2. Prometheus Dashboard on http://localhost:10010 (I had something else on 9009)
  3. Envoy Admin on http://localhost:15000
  4. Grafana (Istio Dashboard) on http://localhost:3000
  5. Jaeger on http://localhost:16686

I say most because since the upgrade to Istio 1.0.3 I've lost the telemetry from istio-ingressgateway in the Jaeger dashboard and I'm not sure how to bring it back. I've dropped the pod and re-created to no-avail.

Outside of that, MetalLB and K8S appear to be working fine and the load-balancer is configured correctly (using ARP).

kubectl get svc -n istio-system
NAME                     TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                                                                                                                   AGE
grafana                  ClusterIP      10.109.247.149   <none>          3000/TCP                                                                                                                  9d
istio-citadel            ClusterIP      10.110.129.92    <none>          8060/TCP,9093/TCP                                                                                                         28d
istio-egressgateway      ClusterIP      10.99.39.29      <none>          80/TCP,443/TCP                                                                                                            28d
istio-galley             ClusterIP      10.98.219.217    <none>          443/TCP,9093/TCP                                                                                                          28d
istio-ingressgateway     LoadBalancer   10.108.175.231   192.168.1.191   80:31380/TCP,443:31390/TCP,31400:31400/TCP,15011:30805/TCP,8060:32514/TCP,853:30601/TCP,15030:31159/TCP,15031:31838/TCP   28d
istio-pilot              ClusterIP      10.97.248.195    <none>          15010/TCP,15011/TCP,8080/TCP,9093/TCP                                                                                     28d
istio-policy             ClusterIP      10.98.133.209    <none>          9091/TCP,15004/TCP,9093/TCP                                                                                               28d
istio-sidecar-injector   ClusterIP      10.102.158.147   <none>          443/TCP                                                                                                                   28d
istio-telemetry          ClusterIP      10.103.141.244   <none>          9091/TCP,15004/TCP,9093/TCP,42422/TCP                                                                                     28d
jaeger-agent             ClusterIP      None             <none>          5775/UDP,6831/UDP,6832/UDP,5778/TCP                                                                                       27h
jaeger-collector         ClusterIP      10.104.66.65     <none>          14267/TCP,14268/TCP,9411/TCP                                                                                              27h
jaeger-query             LoadBalancer   10.97.70.76      192.168.1.193   80:30516/TCP                                                                                                              27h
prometheus               ClusterIP      10.105.176.245   <none>          9090/TCP                                                                                                                  28d
zipkin                   ClusterIP      None             <none>          9411/TCP                                                                                                                  27h

I can expose my deployment using:

kubectl expose deployment enrich-dev --type=LoadBalancer --name=enrich-expose

it all works perfectly fine and I can hit the webpage from the external load balanced IP address (I deleted the exposed service after this).

NAME             TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)           AGE
enrich-expose    LoadBalancer   10.108.43.157   192.168.1.192   31380:30170/TCP   73s
enrich-service   ClusterIP      10.98.163.217   <none>          80/TCP            57m
kubernetes       ClusterIP      10.96.0.1       <none>          443/TCP           36d

If I create a K8S Service in the default namespace (I've tried multiple)

apiVersion: v1
kind: Service
metadata:
  name: enrich-service
  labels:
    run: enrich-service
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
  selector:
    app: enrich

followed by a gateway and a route (VirtualService), the only response I get is a 404 outside of the mesh. You'll see in the gateway I'm using the reserved word mesh but I've tried both that and naming the specific gateway. I've also tried different match prefixes for specific URI and the port you can see below.

Gateway

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: enrich-dev-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"

VirtualService

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: enrich-virtualservice
spec:
  hosts:
  - "enrich-service.default"
  gateways:
  - mesh
  http:
  - match:
    - port: 80
    route:
    - destination:
        host: enrich-service.default
        subset: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: enrich-destination
spec:
  host: enrich-service.default
  trafficPolicy:
    loadBalancer:
      simple: LEAST_CONN
  subsets:
  - name: v1
    labels:
      app: enrich

I've double checked it's not the DNS playing up because I can go into the shell of the ingress-gateway either via busybox or using the K8S dashboard

http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/shell/istio-system/istio-ingressgateway-6bbdd58f8c-glzvx/?namespace=istio-system

and do both an

nslookup enrich-service.default

and

curl -f http://enrich-service.default/

and both work successfully, so I know the ingress-gateway pod can see those. The sidecars are set for auto-injection in both the default namespace and the istio-system namespace.

The logs for the ingress-gateway show the 404:

[2018-11-01T03:07:54.351Z] "GET /metadataHTTP/1.1" 404 - 0 0 1 - "192.168.1.90" "curl/7.58.0" "6c1796be-0791-4a07-ac0a-5fb07bc3818c" "enrich-service.default" "-" - - 192.168.224.168:80 192.168.1.90:43500
[2018-11-01T03:26:39.339Z] "GET /HTTP/1.1" 404 - 0 0 1 - "192.168.1.90" "curl/7.58.0" "ed956af4-77b0-46e6-bd26-c153e29837d7" "enrich-service.default" "-" - - 192.168.224.168:80 192.168.1.90:53960

192.168.224.168:80 is the IP address of the gateway. 192.168.1.90:53960 is the IP address of my external client.

Any suggestions, I've tried hitting this from multiple angles for a couple of days now and I feel I'm just missing something simple. Suggested logs to look at perhaps?

1

There are 1 best solutions below

1
On BEST ANSWER

Just to close this question out for the solution to the problem in my instance. The mistake in configuration started all the way back in the Kubernetes cluster initialisation. I had applied:

kubeadm init --pod-network-cidr=n.n.n.n/n --apiserver-advertise-address 0.0.0.0

the pod-network-cidr using the same address range as the local LAN on which the Kubernetes installation was deployed i.e. the desktop for the Ubuntu host used the same IP subnet as what I'd assigned the container network.

For the most part, everything operated fine as detailed above, until the Istio proxy was trying to route packets from an external load-balancer IP address to an internal IP address which happened to be on the same subnet. Project Calico with Kubernetes seemed to be able to cope with it as that's effectively Layer 3/4 policy but Istio had a problem with it a L7 (even though it was sitting on Calico underneath).

The solution was to tear down my entire Kubernetes deployment. I was paranoid and went so far as to uninstall Kubernetes and deploy again and redeploy with a pod network in the 172 range which wasn't anything to do with my local lan. I also made the same changes in the Project Calico configuration file to match pod networks. After that change, everything worked as expected.

I suspect that in a more public configuration where your cluster was directly attached to a BGP router as opposed to using MetalLB with an L2 configuration as a subset of your LAN wouldn't exhibit this issue either. I've documented it more in this post:

Microservices: .Net, Linux, Kubernetes and Istio make a powerful combination