How to determine why a object was removed from the Varnish Cache?

42 Views Asked by At

We have a Varnish server running in front of our Magento site. We find that Magento is pretty slow when it comes to serving pages, and as such we want to have Varnish serve all category and product pages from the cache.

We've made changes to the VCL file to ensure that the TTL and Grace periods are getting set to 365d to ensure pages are kept in memory for as long as possible. We run a cache warmer daily which looks at the site map and makes a request to each URL in order to warm the cache. The issue we are seeing is that some pages that at the time of the cache warmer running were warm (according to the HIT/MISS header) are no longer getting a cache hit upon later review.

I'm aware of a few ways that an object can be invalidated from the cache:

  • Purge Requests (these are disabled in our VCL)
  • Ban Requests (we make use of these to ban categories and product pages when stock changes)
  • TTL (our TTL is set to 365d, and a full rewarm was done, so shouldn't be aging away)
  • Nuked (confirmed via varnishstat that we do not have any n_lru_nuked objects)

Is there a way to determine which of the above situations caused a object to be invalidated using the URL for the page in question? That would help us track down what is causing the objects to be invalidated and to prevent this from happening.

VCL in case it is relevant:

# VCL version 5.0 is not supported so it should be 4.0 even though actually used Varnish version is 5
vcl 4.0;

import std;
# The minimal Varnish version is 5.0
# For SSL offloading, pass the following header in your proxy server or load balancer: 'X-Forwarded-Proto: https'

backend default {
    .host = "127.0.0.1";
    .port = "8181";
    .first_byte_timeout = 600s;
    .probe = {
        .url = "/pub/health_check.php";
        .timeout = 2s;
        .interval = 5s;
        .window = 10;
        .threshold = 5;
   }
}

#acl purge {
#    "127.0.0.1";
#}

sub vcl_recv {
    if (req.method == "PURGE") {
        #if (client.ip !~ purge) {
            return (synth(405, "Method not allowed"));
        #}
        # To use the X-Pool header for purging varnish during automated deployments, make sure the X-Pool header
        # has been added to the response in your backend server config. This is used, for example, by the
        # capistrano-magento2 gem for purging old content from varnish during it's deploy routine.
        #if (!req.http.X-Magento-Tags-Pattern && !req.http.X-Pool) {
        #    return (synth(400, "X-Magento-Tags-Pattern or X-Pool header required"));
        #}
        #if (req.http.X-Magento-Tags-Pattern) {
        #  ban("obj.http.X-Magento-Tags ~ " + req.http.X-Magento-Tags-Pattern);
        #}
        #if (req.http.X-Pool) {
        #  ban("obj.http.X-Pool ~ " + req.http.X-Pool);
        #}
        #return (synth(200, "Purged"));
    }

    if (req.method != "GET" &&
        req.method != "HEAD" &&
        req.method != "PUT" &&
        req.method != "POST" &&
        req.method != "TRACE" &&
        req.method != "OPTIONS" &&
        req.method != "DELETE") {
          /* Non-RFC2616 or CONNECT which is weird. */
          return (pipe);
    }

    # We only deal with GET and HEAD by default
    if (req.method != "GET" && req.method != "HEAD") {
        return (pass);
    }

    # Bypass shopping cart, checkout and search requests
    if (req.url ~ "/checkout" || req.url ~ "/catalogsearch") {
        return (pass);
    }

    # Bypass health check requests
    if (req.url ~ "/pub/health_check.php") {
        return (pass);
    }

    # Set initial grace period usage status
    set req.http.grace = "none";

    # normalize url in case of leading HTTP scheme and domain
    set req.url = regsub(req.url, "^http[s]?://", "");

    # collect all cookies
    std.collect(req.http.Cookie);

    # Compression filter. See https://www.varnish-cache.org/trac/wiki/FAQ/Compression
    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$") {
            # No point in compressing these
            unset req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate" && req.http.user-agent !~ "MSIE") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unknown algorithm
            unset req.http.Accept-Encoding;
        }
    }

    # Remove Google gclid parameters to minimize the cache objects
    set req.url = regsuball(req.url,"\?gclid=[^&]+$",""); # strips when QS = "?gclid=AAA"
    set req.url = regsuball(req.url,"\?gclid=[^&]+&","?"); # strips when QS = "?gclid=AAA&foo=bar"
    set req.url = regsuball(req.url,"&gclid=[^&]+",""); # strips when QS = "?foo=bar&gclid=AAA" or QS = "?foo=bar&gclid=AAA&bar=baz"

    # Static files caching
    if (req.url ~ "^/(pub/)?(media|static)/") {
        # Static files should not be cached by default
        #return (pass);

        # But if you use a few locales and don't use CDN you can enable caching static files by commenting previous line (#return (pass);) and uncommenting next 3 lines
        unset req.http.Https;
        unset req.http.X-Forwarded-Proto;
        unset req.http.Cookie;
    }

    #bypass for elasticsuite trackers
    if(req.url ~ "elasticsuite/tracker"){
        return (pass);
    }

    #bypass api requests
    if(req.url ~ "/rest/"){
        return (pass);
    }

    #bypass sale nav
    if(req.url ~ "saleNavMarkup.php"){
    return (pass);
    }

    return (hash);
}

sub vcl_hash {
#    if (req.http.cookie ~ "X-Magento-Vary=") {
#        hash_data(regsub(req.http.cookie, "^.*?X-Magento-Vary=([^;]+);*.*$", "\1"));
#    }

    # For multi site configurations to not cache each other's content
    if (req.http.host) {
        hash_data(req.http.host);
    } else {
        hash_data(server.ip);
    }

    # To make sure http users don't see ssl warning
    if (req.http.X-Forwarded-Proto) {
        hash_data(req.http.X-Forwarded-Proto);
    }
    
}

sub vcl_backend_response {
    set beresp.ttl = 365d;
    set beresp.grace = 365d;

    if (beresp.http.content-type ~ "text") {
        set beresp.do_esi = true;
    }

    if (bereq.url ~ "\.js$" || beresp.http.content-type ~ "text") {
        set beresp.do_gzip = true;
    }

    if (beresp.http.X-Magento-Debug) {
        set beresp.http.X-Magento-Cache-Control = beresp.http.Cache-Control;
    }

    # cache only successfully responses and 404s
    if (beresp.status != 200 && beresp.status != 404) {
        set beresp.ttl = 0s;
        set beresp.uncacheable = true;
        return (deliver);
    } elsif (beresp.http.Cache-Control ~ "private") {
        set beresp.uncacheable = true;
        set beresp.ttl = 120s;
        return (deliver);
    }

    # validate if we need to cache it and prevent from setting cookie
    if (beresp.ttl > 0s && (bereq.method == "GET" || bereq.method == "HEAD")) {
        unset beresp.http.set-cookie;
    }

   # If page is not cacheable then bypass varnish for 2 minutes as Hit-For-Pass
   if (beresp.ttl <= 0s ||
       beresp.http.Surrogate-control ~ "no-store" ||
       (!beresp.http.Surrogate-Control &&
       beresp.http.Cache-Control ~ "no-cache|no-store") ||
       beresp.http.Vary == "*") {
        # Mark as Hit-For-Pass for the next 2 minutes
        set beresp.ttl = 120s;
        set beresp.uncacheable = true;
    }

    return (deliver);
}

sub vcl_deliver {
    #if (resp.http.X-Magento-Debug) {
        if (resp.http.x-varnish ~ " ") {
            set resp.http.X-Magento-Cache-Debug = "HIT";
            set resp.http.Grace = req.http.grace;
        } else {
            set resp.http.X-Magento-Cache-Debug = "MISS";
        }
    #} else {
    #    unset resp.http.Age;
    #}

    # Not letting browser to cache non-static files.
    if (resp.http.Cache-Control !~ "private" && req.url !~ "^/(pub/)?(media|static)/") {
        set resp.http.Pragma = "no-cache";
        set resp.http.Expires = "-1";
        set resp.http.Cache-Control = "no-store, no-cache, must-revalidate, max-age=0";
    }

    unset resp.http.X-Magento-Debug;
    unset resp.http.X-Magento-Tags;
    unset resp.http.X-Powered-By;
    unset resp.http.Server;
    unset resp.http.X-Varnish;
    unset resp.http.Via;
    unset resp.http.Link;
}

sub vcl_hit {
    if (obj.ttl >= 0s) {
        # Hit within TTL period
        return (deliver);
    }
    if (std.healthy(req.backend_hint)) {
        if (obj.ttl + 300s > 0s) {
            # Hit after TTL expiration, but within grace period
            set req.http.grace = "normal (healthy server)";
            return (deliver);
        } else {
            # Hit after TTL and grace expiration
            return (miss);
        }
    } else {
        # server is not healthy, retrieve from cache
        set req.http.grace = "unlimited (unhealthy server)";
        return (deliver);
    }
}
1

There are 1 best solutions below

0
Thijs Feryn On

The logs are you best friend here: varnishlog can answer that question in great detail. The challenge is (as always) reproducing the issues while you're looking at varnishlog output.

I'll formulate some strategies to make this a little easier.

Track the vxid

Every transaction has a vxid, this is a unique transaction ID that is exposed through the X-Varnish response header.

Log items are also identified by this transaction ID. If you know the value of the X-Varnish header for the miss, there is a chance you can look this up on the logs.

Imagine you're trying to track a transaction where the X-Varnish: 5 header was part of the response. This would result in the following varnishlog command:

sudo varnishlog -d -g request -q "vxid == 5"

As you can see the -q option is used to filter based on the vxid tag in the logs. The -d option would dump the content of the Varnish Shared Memory Logs rather than looking for new input.

Logs are stored in memory

However, since Varnish Shared Memory logs are stored in-memory to avoid performance degradation, you have to be lucky that the transactions you're looking for haven't been overwritten yet.

There are 2 ways to tackle this potential limitation:

  1. Increase the memory VSL buffer size
  2. Log to disk

Increasing the VSL buffer

Increasing the VSL buffer can be done in varnishadm. By default the vsl_space runtime parameter is set to 80 MB. If we would temporarily increase it to 500 MB, the following commands can be run on the Varnish server:

sudo varnishadm param.set vsl_space 500M
sudo varnishadm stop
sudo varnishadm start

A restart of the child process is required to change the value of vsl_space. A restart of varnishd would undo this setting.

The stop and start commands will result empty the cache. Keep that in mind.

Run the following command to verify the changes:

sudo varnishadm param.show vsl_space
vsl_space
        Value is: 500M [bytes]
        Default is: 80M
        Minimum is: 1M
        Maximum is: 4G

        The amount of space to allocate for the VSL fifo buffer in the
        VSM memory segment.  If you make this too small,
        varnish{ncsa|log} etc will not be able to keep up.  Making it
        too large just costs memory resources.

        NB: This parameter will not take any effect until the child
        process has been restarted.

Logging to disk

Another option is to store the logs on disk. While this has the ability to capture more relevant transactions, there are risks involved:

  • You could easily run out of disk space
  • Writing logs to disk can cause performance degradation

If you do want to write the logs to disk, I suggest storing them in a binary format so you can replay them later using varnishlog.

Here's the command you need:

sudo varnishlog -g session -a -w /var/log/vsl.log

If you already know the URL that you want to filter on, it would help to include a URL filter in your varnishlog command. Here's an example for the homepage:

sudo varnishlog -g session -q "ReqUrl eq '/'" -a -w /var/log/vsl.log

The URL filter will greatly reduce the strain on the server and make the entire thing a lot more lightweight.

Replaying the log in varnishlog can then be done using the following command:

sudo varnishlog -g request -r /var/log/vsl.log

Understanding what caused the cache miss

There is a whole range of VSL tags that can expose the issue your facing. The complete list can be found on https://varnish-cache.org/docs/trunk/reference/vsl.html.

In your case, I would suggest paying close attention to the following ones:

  • ExpBan
  • ExpKill
  • Hit
  • HitMiss
  • HitPass
  • TTL
  • VCL_call
  • VCL_return

Next steps

When you tracked down the complete logs for a transaction that ended up being a miss, don't hesitate to append the full varnishlog output for this transaction to your question.

I'll help you debug when you have the information.