Health check times out for managed instance group

714 Views Asked by At

Problem

I have a health check for a managed instance group on GCP that continuously times out. Therefore, the group manager thinks all instances are always unhealthy.

enter image description here

My guess is I've misconfigured the firewall, but I can't determine if that's the case or how I've misconfigured it.

Note: I'm not asking about the load balance health checks. I'm asking about the managed instance group health checks for auth-healing.

Error

In the "Errors" tab of the managed instance group, I always see the following error:

WAITING_FOR_HEALTHY_TIMEOUT_EXCEEDED

Waiting for HEALTHY state timed out (autohealingPolicy.initialDelay=300 sec) for instance projects/project-402019/zones/us-central1-f/instances/server-20bd and health check projects/project-402019/global/healthChecks/server-health-check.

Debug steps

  • When I ssh into a VM and run curl http://127.0.0.1:3001/hello, I successfully get http://127.0.0.1:3001/hello (I'm running a simple echo service right now).
  • I've set up the firewall to allow incomming traffic on port 3001 as described in the docs. See firewall setup below.
  • Set the initialization period and autohealing initial delay to 5+ minutes in case it takes forever to start up (I can ssh into the VM after 30s and curl successfully).
  • Set the health check timeout to 30 seconds (when I curl it responds instantly).
  • Opened all ports on the firewall.

Setup

Firewall

enter image description here

Health check

I've also tried multiple different settings here:

  • TCP vs. HTTP
  • Intervals/timeouts ranging from 2-30 seconds.
  • Healthy/unhealthy thresholds from 1-3.

enter image description here

1

There are 1 best solutions below

0
On

Turns out your binary needs to listen on 0.0.0.0, not 127.0.0.1.

Once I made that change to my code, everything else worked.