Istio envoy gateway connected to upstream but getting 404 response

424 Views Asked by At

Trying to wrap my head around Istio and the service mesh. I had a working cluster set-up with an nginx ingress and using cert-manager for TLS. I swtiched over to Istio and a gateway/ virtual service set up, and as far as I can tell, everything is connected, but when I try to access the site it comes back with a blanks screen (404 response on the network tab) and when I curl I see a 404. This is the same with trying direct or specifying the 443 port. Not sure how to debug, Istio's docs only mention the 404 with multiple gateways with same TLS cert, but I only have the 1 gateway at this time. Also, the gateway and virtual service are in the same namespace, and in the virtual service, the route for the backend - /api is set before the frontend - /

Heres the only error response I get which is from curl with options, doing a plain curl returns nothing at all, not even a 403. On GKE console, all workloads are good, no errors in logs.

curl -X OPTIONS https://app.example.net -I
HTTP/2 404 
date: Wed, 29 Nov 2023 20:18:13 GMT
server: istio-envoy

The logs show connection to upstream:

2023-11-19T20:48:48.798743Z info    Readiness succeeded in 1.15333632s
2023-11-19T20:48:48.799470Z info    Envoy proxy is ready
2023-11-19T21:17:44.948873Z info    xdsproxy    connected to upstream XDS server: istiod.istio-system.svc:15012
2023-11-19T21:47:40.301270Z info    xdsproxy    connected to upstream XDS server: istiod.istio-system.svc:15012
2023-11-19T22:18:07.530190Z info    xdsproxy    connected to upstream XDS server: istiod.istio-system.svc:15012
...
2023-11-20T08:48:48.028231Z info    ads XDS: Incremental Pushing ConnectedEndpoints:2 Version:
2023-11-20T08:48:48.250424Z info    cache   generated new workload certificate  latency=221.620042ms ttl=23h59m59.749615036s
2023-11-20T09:17:09.369171Z info    xdsproxy    connected to upstream XDS server: istiod.istio-system.svc:15012
2023-11-20T09:46:07.080923Z info    xdsproxy    connected to upstream XDS server: istiod.istio-system.svc:15012
...

Mesh shows connected sidecars for the gateway, frontend, backend:

$ istioctl proxy-status
NAME                                           CLUSTER        CDS        LDS        EDS        RDS        ECDS         ISTIOD                      VERSION
backend-deploy-67486897bb-fjv5g.demoapp        Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-64c94c5d78-5879x     1.19.3
demoapp-gtw-istio-674b96dcdb-mfsfg.demoapp     Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-64c94c5d78-5879x     1.19.3
frontend-deploy-6f6b4984b5-lnq4p.demoapp       Kubernetes     SYNCED     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-64c94c5d78-5879x     1.19.3

Gateway

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demoapp-gtw
  namespace: demoapp
  annotations:
    cert-manager.io/issuer: "letsencrypt-prod"
spec:
  selector:
    istio: ingressgateway
  servers: 
  - port: 
      name: http
      number: 80
      protocol: HTTP
    hosts: [app.example.net]
    tls:
      httpsRedirect: true
  - port:
      name: https
      number: 443
      protocol: HTTPS
    hosts: [app.example.net]
    tls:
      mode: SIMPLE
      credentialName: demoapp-tls

Virtual Service

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: vert-serv-from-gw
spec:
  hosts: [ app.example.net ]
  gateways: 
  - "demoapp/demoapp-gtw"
  - mesh
  http:
  - match:
    - uri:
        prefix: /api
    route:
    - destination:
        host: backend-svc
        port:
          number: 5000
    corsPolicy:
      allowOrigins:
      - exact: https://app.octodemo.net
      allowMethods:
      - PUT
      - GET
      - POST
      - PATCH
      - OPTIONS
      - DELETE
      allowHeaders:
      - DNT
      - X-CustomHeader
      - X-LANG
      - Keep-Alive
      - User-Agent
      - X-Requested-With
      - If-Modified-Since
      - Cache-Control
      - Content-Type
      - X-Api-Key
      - X-Device-Id
      - Access-Control-Allow-Origin
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: frontend-svc
        port:
          number: 3000

Not sure how to try and debug this further with no clear errors, if anyone has any suggestions, I'm all ears. Thanks

EDIT So I think I've dialed in a bit of what is happening. Running proxy-config on the routes for the gateway shows:

$ istioctl pc routes demoapp-gtw-istio-674b96dcdb-mfsfg.demoapp 
NAME                                                                            VHOST NAME        DOMAINS     MATCH                  VIRTUAL SERVICE
http.80                                                                         blackhole:80      *           /*                     404
https.443.default.demoapp-gtw-istio-autogenerated-k8s-gateway-https.demoapp     blackhole:443     *           /*                     404
                                                                                backend           *           /stats/prometheus*     
                                                                                backend           *           /healthz/ready*        

From istio my understanding of the blackhole or passthrough clusters is that the blackhole is to prevent unauthorized ingress and egress traffic to the mesh services, but that the default is for passthrough or ALLOW_ANY. Below on the configmap for istio, I'm not seeing either desgined, took the cue from here

$ kubectl get configmap istio -n istio-system -o yaml 
apiVersion: v1
data:
  mesh: |-
    defaultConfig:
      discoveryAddress: istiod.istio-system.svc:15012
      proxyMetadata: {}
      tracing:
        zipkin:
          address: zipkin.istio-system:9411
    defaultProviders:
      metrics:
      - prometheus
    enablePrometheusMerge: true
    rootNamespace: istio-system
    trustDomain: cluster.local
  meshNetworks: 'networks: {}'
kind: ConfigMap
metadata:
  creationTimestamp: "2023-10-26T17:45:35Z"
  labels:
    install.operator.istio.io/owning-resource: installed-state
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio.io/rev: default
    operator.istio.io/component: Pilot
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.19.3
    release: istio
  name: istio
  namespace: istio-system
  resourceVersion: "69895477"
  uid: 3c542bc5-5f9f-4486-a37c-2c04fadba0ed

Maybe thats cause I'm not updated enough in my version?

$ istioctl version
client version: 1.20.0
control plane version: 1.19.3
data plane version: 1.19.3 (3 proxies)

Regardless, my routes from the gateway to services should not be getting blackholed as they are declared in the virtual service... right?

1

There are 1 best solutions below

0
On

Well I dont have a solution, but I'm fairly certain I've found the problem.

The istio routes are pointing to services belonging to another namespace:

$ istioctl pc routes backend-deploy-7f584f9fd7-mn5z4.demoapp
NAME                                                  VHOST NAME                                                DOMAINS                                                      MATCH                  VIRTUAL SERVICE
test-frontend-svc.demotest.svc.cluster.local:3000     test-frontend-svc.demotest.svc.cluster.local:3000         *                                                            /*                     
9090                                                  kiali.istio-system.svc.cluster.local:9090                 kiali.istio-system, 10.92.12.180                             /*                     
                                                      backend                                                   *                                                            /healthz/ready*        
inbound|80||                                          inbound|http|80                                           *                                                            /*                     
inbound|80||                                          inbound|http|80                                           *                                                            /*                     
test-backend-svcs.demotest.svc.cluster.local:5000     test-backend-svcs.demotest.svc.cluster.local:5000         *                                                            /*                     

Based on a github answer for another users question (from 2019) " My understanding was that this is a known limitation with existing workaround: using distinct names for the ports solves the issue.", I even changed the port names to make them unique per namespace and shifted the port number by 1, but it is still pointing to the wrong services on the old port names.

Here's the updated virtual service after those changes:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: vert-serv-from-gw
spec:
  hosts: [ app.octodemo.net ]
  gateways: 
  - "demoapp/demoapp-gtw"
  - mesh
  http:
  - match:
    - uri:
        prefix: /api
    route:
    - destination:
        host: backend-svc
        port:
          number: 5001
    corsPolicy:
      allowOrigins:
      - exact: https://app.octodemo.net
      allowMethods:
      - PUT
      - GET
      - POST
      - PATCH
      - OPTIONS
      - DELETE
      allowHeaders:
      - DNT
      - X-CustomHeader
      - X-LANG
      - Keep-Alive
      - User-Agent
      - X-Requested-With
      - If-Modified-Since
      - Cache-Control
      - Content-Type
      - X-Api-Key
      - X-Device-Id
      - Access-Control-Allow-Origin
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: frontend-svc
        port:
          number: 3001

That did not work, as shown above, istio continues to point to the wrong namespace service (test-backend-svcs and test-frontend-svc). So while digging in their docs, they state this about routes:

Note for Kubernetes users: When short names are used (e.g. “reviews” instead of “reviews.default.svc.cluster.local”), Istio will interpret the short name based on the namespace of the rule, not the service. A rule in the “default” namespace containing a host “reviews will be interpreted as “reviews.default.svc.cluster.local”, irrespective of the actual namespace associated with the reviews service. To avoid potential misconfigurations, it is recommended to always use fully qualified domain names over short names.

So I tried this, using the long name provided through the service registry (backend-svc.demoapp.svc.cluster.local and frontend-svc.demoapp.svc.cluster.local) through this post's approach, and still I'm getting the same result, only showing services for the other namespace which has not been configured.

There is not even a gateway or virtual service in the other namespace, the only step I had taken there was to enable the autoinjection for the sidecars. So how and why this is happening, despite the changes to more specifically (not that they should have needed to have been) point to the correct services, it is still pointing to the services in another namespace on incorrect ports. I'm at a loss of what to do, other than dump the cluster and start fresh. If anyone has any idea as to how this came about or if they have a similar issue, please let me know, as this does nothing to resolve the issue or point to something to avoid going forward.