varnish 503 is wrong and backend_health has an error of 400

974 Views Asked by At

enter image description here enter image description here

May I ask what causes this problem? This problem has been bothering me for a long time. How should I solve this problem? thank you in advance.'

this is my vcl, I actually tried /health_check.php and /pub/health_check.php

# VCL version 5.0 is not supported so it should be 4.0 even though actually used Varnish version is 6
vcl 4.0;

import std;
# The minimal Varnish version is 6.0
# For SSL offloading, pass the following header in your proxy server or load balancer: 'X-Forwarded-Proto: https'

backend default {
    .host = "localhost";
    .port = "8080";
    .first_byte_timeout = 600s;
    .probe = {
        .url = "health_check.php";
        .timeout = 2s;
        .interval = 5s;
        .window = 10;
        .threshold = 5;
   }
}

acl purge {
    "localhost";
    "127.0.0.1";
    "::1";
}

sub vcl_recv {
    if (req.restarts > 0) {
        set req.hash_always_miss = true;
    }

    if (req.method == "PURGE") {
        if (client.ip !~ purge) {
            return (synth(405, "Method not allowed"));
        }
        # To use the X-Pool header for purging varnish during automated deployments, make sure the X-Pool header
        # has been added to the response in your backend server config. This is used, for example, by the
        # capistrano-magento2 gem for purging old content from varnish during it's deploy routine.
        if (!req.http.X-Magento-Tags-Pattern && !req.http.X-Pool) {
            return (synth(400, "X-Magento-Tags-Pattern or X-Pool header required"));
        }
        if (req.http.X-Magento-Tags-Pattern) {
          ban("obj.http.X-Magento-Tags ~ " + req.http.X-Magento-Tags-Pattern);
        }
        if (req.http.X-Pool) {
          ban("obj.http.X-Pool ~ " + req.http.X-Pool);
        }
        return (synth(200, "Purged"));
    }

    if (req.method != "GET" &&
        req.method != "HEAD" &&
        req.method != "PUT" &&
        req.method != "POST" &&
        req.method != "TRACE" &&
        req.method != "OPTIONS" &&
        req.method != "DELETE") {
          /* Non-RFC2616 or CONNECT which is weird. */
          return (pipe);
    }

    # We only deal with GET and HEAD by default
    if (req.method != "GET" && req.method != "HEAD") {
        return (pass);
    }

    # Bypass customer, shopping cart, checkout
    if (req.url ~ "/customer" || req.url ~ "/checkout") {
        return (pass);
    }

    # Bypass health check requests
    if (req.url ~ "^/(pub/)?(health_check.php)$") {
        return (pass);
    }

    # Set initial grace period usage status
    set req.http.grace = "none";

    # normalize url in case of leading HTTP scheme and domain
    set req.url = regsub(req.url, "^http[s]?://", "");

    # collect all cookies
    std.collect(req.http.Cookie);

    # Compression filter. See https://www.varnish-cache.org/trac/wiki/FAQ/Compression
    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$") {
            # No point in compressing these
            unset req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate" && req.http.user-agent !~ "MSIE") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unknown algorithm
            unset req.http.Accept-Encoding;
        }
    }

    # Remove all marketing get parameters to minimize the cache objects
    if (req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|fbclid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=") {
        set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|fbclid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=[-_A-z0-9+()%.]+&?", "");
        set req.url = regsub(req.url, "[?|&]+$", "");
    }

    # Static files caching
    if (req.url ~ "^/(pub/)?(media|static)/") {
        # Static files should not be cached by default
        return (pass);

        # But if you use a few locales and don't use CDN you can enable caching static files by commenting previous line (#return (pass);) and uncommenting next 3 lines
        #unset req.http.Https;
        #unset req.http.X-Forwarded-Proto;
        #unset req.http.Cookie;
    }

    # Authenticated GraphQL requests should not be cached by default
    if (req.url ~ "/graphql" && req.http.Authorization ~ "^Bearer") {
        return (pass);
    }

    return (hash);
}

sub vcl_hash {
    if (req.http.cookie ~ "X-Magento-Vary=") {
        hash_data(regsub(req.http.cookie, "^.*?X-Magento-Vary=([^;]+);*.*$", "\1"));
    }

    # To make sure http users don't see ssl warning
    if (req.http.X-Forwarded-Proto) {
        hash_data(req.http.X-Forwarded-Proto);
    }
    

    if (req.url ~ "/graphql") {
        call process_graphql_headers;
    }
}

sub process_graphql_headers {
    if (req.http.Store) {
        hash_data(req.http.Store);
    }
    if (req.http.Content-Currency) {
        hash_data(req.http.Content-Currency);
    }
}

sub vcl_backend_response {

    set beresp.grace = 3d;

    if (beresp.http.content-type ~ "text") {
        set beresp.do_esi = true;
    }

    if (bereq.url ~ "\.js$" || beresp.http.content-type ~ "text") {
        set beresp.do_gzip = true;
    }

    if (beresp.http.X-Magento-Debug) {
        set beresp.http.X-Magento-Cache-Control = beresp.http.Cache-Control;
    }

    # cache only successfully responses and 404s
    if (beresp.status != 200 && beresp.status != 404) {
        set beresp.ttl = 0s;
        set beresp.uncacheable = true;
        return (deliver);
    } elsif (beresp.http.Cache-Control ~ "private") {
        set beresp.uncacheable = true;
        set beresp.ttl = 86400s;
        return (deliver);
    }

    # validate if we need to cache it and prevent from setting cookie
    if (beresp.ttl > 0s && (bereq.method == "GET" || bereq.method == "HEAD")) {
        unset beresp.http.set-cookie;
    }

   # If page is not cacheable then bypass varnish for 2 minutes as Hit-For-Pass
   if (beresp.ttl <= 0s ||
       beresp.http.Surrogate-control ~ "no-store" ||
       (!beresp.http.Surrogate-Control &&
       beresp.http.Cache-Control ~ "no-cache|no-store") ||
       beresp.http.Vary == "*") {
        # Mark as Hit-For-Pass for the next 2 minutes
        set beresp.ttl = 120s;
        set beresp.uncacheable = true;
    }

    return (deliver);
}

sub vcl_deliver {
    if (resp.http.X-Magento-Debug) {
        if (resp.http.x-varnish ~ " ") {
            set resp.http.X-Magento-Cache-Debug = "HIT";
            set resp.http.Grace = req.http.grace;
        } else {
            set resp.http.X-Magento-Cache-Debug = "MISS";
        }
    } else {
        unset resp.http.Age;
    }

    # Not letting browser to cache non-static files.
    if (resp.http.Cache-Control !~ "private" && req.url !~ "^/(pub/)?(media|static)/") {
        set resp.http.Pragma = "no-cache";
        set resp.http.Expires = "-1";
        set resp.http.Cache-Control = "no-store, no-cache, must-revalidate, max-age=0";
    }

    unset resp.http.X-Magento-Debug;
    unset resp.http.X-Magento-Tags;
    unset resp.http.X-Powered-By;
    unset resp.http.Server;
    unset resp.http.X-Varnish;
    unset resp.http.Via;
    unset resp.http.Link;
}

sub vcl_hit {
    if (obj.ttl >= 0s) {
        # Hit within TTL period
        return (deliver);
    }
    if (std.healthy(req.backend_hint)) {
        if (obj.ttl + 300s > 0s) {
            # Hit after TTL expiration, but within grace period
            set req.http.grace = "normal (healthy server)";
            return (deliver);
        } else {
            # Hit after TTL and grace expiration
            return (restart);
        }
    } else {
        # server is not healthy, retrieve from cache
        set req.http.grace = "unlimited (unhealthy server)";
        return (deliver);
    }
}

My program is magento and my health_check.php file is in pub

this is my nginx error

2022/11/04 09:15:23 [crit] 3378368#0: *538 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: *.*.*.224, server: 0.0.0.0:443

2022/11/04 09:37:30 [notice] 3383586#0: signal process started

I Runing varnishlog -g request -q "ReqUrl eq '/health_check.php'", no response

I ran sudo varnishlog -g request -q "ReqUrl eq '/'", no response. And then I ran sudo varnishlog -g request -q "VCL_call eq 'BACKEND_ERROR'" return 503

1

There are 1 best solutions below

8
Thijs Feryn On

It's clear that the 400 error of the health probe causes the backend to be unhealthy. Varnish will return a 503 because the backend is unhealthy regardless of the status code.

Please share the backend & probe configuration from your VCL file so I can figure out how the health checking endpoint is probed.

It also makes sense for you to run a varnishlog -g request -q "ReqUrl eq '/health_check'" where /health_check is replaced by the URL of the health checking probe.

Please also have a look at the webserver logs of the backend to see if there's any indication on why the HTTP 400 status code is returned.

Update: run the health check manually

The Nginx error logs didn't contain anything useful. Please check the access logs as well. If there is useful information in there, don't hesitate to add it to your question.

Please also check the Magento (debug) logs for more information.

You could also run curl -I http://localhost:8080/health_check.php from the command line on the Varnish server to see if anything specific comes up in the output for the script or in the Magento logs.

Update 2: provide the right health check host header

It looks as though your Nginx server doesn't route requests coming from localhost to the right Magento virtual host.

In that case I suggest adding the .host_header property to your backend definition as illustrated below:

backend default {
    .host = "localhost";
    .host_header = "mydomain";
    .port = "8080";
    .first_byte_timeout = 600s;
    .probe = {
        .url = "/health_check.php";
        .timeout = 2s;
        .interval = 5s;
        .window = 10;
        .threshold = 5;
   }
}

This host header will ensure that the right host header is sent to the backend while performing health checks.

Since you mentioned in your comment that https://mydomain/health_check.php returns a valid 200 OK status code, this seems like a first step in the right direction.

Please update .host_header = "mydomain" and ensure the right domain name is used.

Update 3: figure out why the page doesn't load

Your backend seems to be healthy now (based on the comments), but the site still doesn't work.

In that case, run the following command to debug:

sudo varnishlog -g request -q "ReqUrl eq '/'"

This command will display the full VSL transaction log for the homepage. Go ahead and run it and add the full output to your question in case you need help.

Update 4: first byte timeout issues

The logs you provided give a clear indication of what's causing the failure.

Here's the relevant log extract:

BackendOpen     31 default 127.0.0.1 8080 127.0.0.1 57786
BackendStart    127.0.0.1 8080
Timestamp       Bereq: 1668044657.403995 0.000161 0.000161
FetchError      first byte timeout
BackendClose    31 default
Timestamp       Beresp: 1668045257.502267 600.098433 600.098272
Timestamp       Error: 1668045257.502275 600.098441 0.000008

Varnish is opening up a connection to the backend via a backend fetch thread and sends the backend request. It took Varnish 0.000161 seconds to send the request.

Now the Varnish backend thread is waiting for the backend to respond. The first byte timeout is set to 600 seconds, which is incredibly long. However, the timeout is still triggered.

The Timestamp Beresp log line indicates that it took Varnish 600.098272 to receive the backend response. But we know that this is just a timeout and the error is triggered directly after that.

Something really weird is going on and you need to figure out why your backend is taking so long to respond. This doesn't really have anything to do with Varnish itself, but it could be that your application behaves in a strange way when a proxy server is put in front of it. It's also possible that your backend application is just slow under all circumstances. That's information I don't have and that's beyond the scope of this question.