I've been trying to get the http-01 challenge method working with traefik v2 and cert-manager, both installed through their current helm charts. The LB endpoint can be requested through the ip and hostname, and I've tested that the http host passes on letsdebug (No issues were found with <domain>).

Traefik lives in the traefik namespace, while cert-manager lives in its own cert-manager namespace. I've created a ClusterIssuer inside the cert-manager namespace:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
      - http01:
          ingress:
            class: traefik
            ingressTemplate:
              metadata:
                namespace: cert-manager
                annotations:
                  traefik.ingress.kubernetes.io/router.entrypoints: web

The ingressTemplate part is my attempt at making the randomly created ingress from cert-manager map to the correct traefik endpoint - this hasn't changed anything, but I leave it in in case I've fubared anything here.

I've then created a Certificate and applied it - I've tried using both the cert-manager, traefik and default namespace for this, without any differing luck (the actual domain name has been replaced with domain.example.com):

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: domain.example.com
spec:
  secretName: domain-example-com-tls
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt-staging
  commonName: domain.example.com
  dnsNames:
    - domain.example.com

Looking at the logs for the cert-manager pod, I can see both a 404 error and then a "DNS A record error" - the DNS record error seems spurious as it can be resolved with other services and has been present for > 24hrs.

I0413 12:37:51.478359       1 conditions.go:201] Setting lastTransitionTime for Certificate "domain.example.com" condition "Issuing" to 2022-04-13 12:37:51.478353098 +0000 UTC m=+6998.327004050
I0413 12:37:51.760018       1 controller.go:161] cert-manager/certificates-key-manager "msg"="re-queuing item due to optimistic locking on resource" "key"="default/domain.example.com" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"domain.example.com\": the object has been modified; please apply your changes to the latest version and try again"
I0413 12:37:51.769026       1 conditions.go:261] Setting lastTransitionTime for CertificateRequest "domain.example.com-r98k2" condition "Approved" to 2022-04-13 12:37:51.769016958 +0000 UTC m=+6998.617667914
I0413 12:37:51.836517       1 conditions.go:261] Setting lastTransitionTime for CertificateRequest "domain.example.com-r98k2" condition "Ready" to 2022-04-13 12:37:51.836496254 +0000 UTC m=+6998.685147170
I0413 12:37:51.868932       1 conditions.go:261] Setting lastTransitionTime for CertificateRequest "domain.example.com-r98k2" condition "Ready" to 2022-04-13 12:37:51.868921204 +0000 UTC m=+6998.717572135
I0413 12:37:51.888553       1 controller.go:161] cert-manager/certificaterequests-issuer-acme "msg"="re-queuing item due to optimistic locking on resource" "key"="default/domain.example.com-r98k2" "error"="Operation cannot be fulfilled on certificaterequests.cert-manager.io \"domain.example.com-r98k2\": the object has been modified; please apply your changes to the latest version and try again"
E0413 12:37:53.529269       1 controller.go:210] cert-manager/challenges/scheduler "msg"="error scheduling challenge for processing" "error"="Operation cannot be fulfilled on challenges.acme.cert-manager.io \"domain.example.com-r98k2-2809069211-587139531\": the object has been modified; please apply your changes to the latest version and try again" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1"
I0413 12:37:55.028477       1 pod.go:71] cert-manager/challenges/http01/ensurePod "msg"="creating HTTP01 challenge solver pod" "dnsName"="domain.example.com" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.237109       1 pod.go:59] cert-manager/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="domain.example.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-k8wl8" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.237350       1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="domain.example.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-gvvkt" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.237539       1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="domain.example.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-pbs7c" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:37:55.260608       1 sync.go:186] cert-manager/challenges "msg"="propagation check failed" "error"="wrong status code '404', expected '200'" "dnsName"="domain.example.com" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.299879       1 pod.go:59] cert-manager/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="domain.example.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-k8wl8" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.300223       1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="domain.example.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-gvvkt" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.300570       1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="domain.example.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-pbs7c" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:37:55.316802       1 sync.go:186] cert-manager/challenges "msg"="propagation check failed" "error"="wrong status code '404', expected '200'" "dnsName"="domain.example.com" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:38:05.261345       1 pod.go:59] cert-manager/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="domain.example.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-k8wl8" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:38:05.263416       1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="domain.example.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-gvvkt" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:38:05.263822       1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="domain.example.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-pbs7c" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:38:25.541964       1 sync.go:386] cert-manager/challenges/acceptChallenge "msg"="error waiting for authorization" "error"="context deadline exceeded" "dnsName"="domain.example.com" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:38:25.542087       1 controller.go:166] cert-manager/challenges "msg"="re-queuing item due to error processing" "error"="context deadline exceeded" "key"="default/domain.example.com-r98k2-2809069211-587139531"
I0413 12:38:30.542803       1 pod.go:59] cert-manager/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="domain.example.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-k8wl8" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:38:30.543062       1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="domain.example.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-gvvkt" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:38:30.543218       1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="domain.example.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-pbs7c" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:38:46.682039       1 sync.go:386] cert-manager/challenges/acceptChallenge "msg"="error waiting for authorization" "error"="acme: authorization error for domain.example.com: 400 urn:ietf:params:acme:error:dns: During secondary validation: DNS problem: query timed out looking up A for domain.example.com; DNS problem: query timed out looking up AAAA for domain.example.com" "dnsName"="domain.example.com" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:38:46.888731       1 controller.go:102] ingress 'default/cm-acme-http-solver-pbs7c' in work queue no longer exists

Looking at Traefik's pod log, I can see that the ingress gets created, but that Traefik is unable to route any requests to it because it can't find the endpoint (this is what I tried to fix with the annotation in the ingressTemplate above):

time="2022-04-13T12:37:57Z" level=error msg="Skipping service: no endpoints found" providerName=kubernetes namespace=default servicePort="&ServiceBackendPort{Name:,Number:8089,}" ingress=cm-acme-http-solver-pbs7c serviceName=cm-acme-http-solver-gvvkt
time="2022-04-13T12:38:46Z" level=error msg="Skipping service: no endpoints found" serviceName=cm-acme-http-solver-gvvkt servicePort="&ServiceBackendPort{Name:,Number:8089,}" providerName=kubernetes ingress=cm-acme-http-solver-pbs7c namespace=default
time="2022-04-13T12:38:46Z" level=error msg="Cannot create service: service not found" servicePort="&ServiceBackendPort{Name:,Number:8089,}" providerName=kubernetes ingress=cm-acme-http-solver-pbs7c namespace=default serviceName=cm-acme-http-solver-gvvkt
time="2022-04-13T12:38:46Z" level=error msg="Cannot create service: service not found" servicePort="&ServiceBackendPort{Name:,Number:8089,}" namespace=default providerName=kubernetes serviceName=cm-acme-http-solver-gvvkt ingress=cm-acme-http-solver-pbs7c

And there's where I'm stuck currently, since the plan is to use Traefik's IngressRoute CRD for exposing hosts and not use regular ingress entries. Another option would be to test the experimental Gateway support, but as this is the initial setup for a prod cluster I'm not planning to go down that route yet.

Any ideas or further debug information that could be useful?

2

There are 2 best solutions below

0
On BEST ANSWER

We have faced the same issue and the problem was related to the fact, that the Ingress generated by the certificate manger contained the Ingress Controller reference using the deprecated Annotation kubernetes.io/ingress.class.

What we wanted:

spec:
  ingressClassName: my-traefik-controller

What we got:

annotations:
  kubernetes.io/ingress.class: "my-traefik-controller"

This way, the traefik Ingress Controlelr found the Ingress, but was not able to find the service. There is a whole discussion on this topic in the cert-manger Github repo.

The solution was to use the cert-manager Annotation acme.cert-manager.io/http01-edit-in-place: "true" on an existing Ingress.

  annotations:
    cert-manager.io/cluster-issuer: my-issuer
    acme.cert-manager.io/http01-edit-in-place: "true"
  spec:
    ingressClassName: my-traefik-controller

This way, only the existing Ingress (containing the correct ingressClassName reference) gets modified and no new solver Ingress gets created.

0
On

I encountered the same error while setting up a home Kubernetes environment. In my case the issue was NAT loopback/hairpinning/reflection, as certain ISPs restrict the ability to reach hosts within a private network using the public IP address.

Solution involved modifying the coredns configmap and adding the following entry:

rewrite name domain.example.com traefik.<traefik-namespace>.svc.cluster.local

Then simply restart coredns pod. Hope it will help someone.