Which are the optimum cache-related HTTP headers for a content that can change?

1k Views Asked by At

We have several files which are served through HTTP and which change from time to time.

Which are the HTTP headers, related to caching, that we should return in the HTTP response to optimize browser load speed while at the same time forcing the browser to validate that it has the last version of the file?

We are already setting an "Expires" header with a date in the past (there seems to be consensus at this point).

But then some people recommend setting this header:

Cache-Control: no-cache, no-store, must-revalidate

But the problem with this header is that it prevents the browser to keep a local copy of the file, so the file is downloaded every time, even if it doesn't change, with a 200 response code.

If I just use:

Cache-Control: no-cache

Then the browser (at least Firefox 14 and Chrome 20) keeps a local copy, sends If-Modified-Since and If-None-Match headers, and the server returns a 304 code and the file contents are not downloaded. This is the optimum behavior for these files that can change at any time.

The problem is that I don't know if just setting "no-cache" is enough to force all browsers (including old but still used versions) and proxy servers to revalidate their locally cached copy with the server.

Finally, what about Pragma: no-cache header? Should it be included in the HTTP response too?

3

There are 3 best solutions below

0
On

The best way, maybe not 100% fitting your needs is:

Cache-Control:max-age=315360000, public
Expires:Tue, 23 Aug 2022 10:53:13 GMT

And give the file a "content dependent filename" such as stylesheet_v32.css. As soon as the content changes, change the filename + reference to and the browser gets the latest version. If the filename stays, the browser doesn't need to request it.

This is safe and consistent through out the browsers.

Relying on Cache-Control: no-cache and the browsers saving it anyway is what I wouldn't like to do.

2
On

The Google developers documentation has a nice documentation on caching and provides some nice patterns to use.

For instance, it has a flow chart for defining a optimal cache control policy.

enter image description here

Further, it defines a nice pattern to add a fingerprint to the files and set a expiration for a longer time such as a year.

  • Locally cached responses are used until the resource 'expires'
  • Embedding a file content fingerprint in the URL enables us to force the client to update to a new version of the response
  • Each application needs to define its own cache hierarchy for optimal performance

enter image description here

The ability to define per-resource caching policies allows us to define “cache hierarchies” that allow us to control not only how long each is cached for, but also how quickly new versions are seen by visitor. For example, let’s analyze the above example:

  • The HTML is marked with “no-cache”, which means that the browser will always revalidate the document on each request and fetch the latest version if the contents change. Also, within the HTML markup we embed fingerprints in the URLs for CSS and JavaScript assets: if the contents of those files change, than the HTML of the page will change as well and new copy of the HTML response will be downloaded.
  • The CSS is allowed to be cached by browsers and intermediate caches (e.g. a CDN), and is set to expire in 1 year. Note that we can use
    the “far future expires” of 1 year safely because we embed the file
    fingerprint its filename: if the CSS is updated, the URL will change
    as well.
  • The JavaScript is also set to expire in 1 year, but is marked as private, perhaps because it contains some private user data that the
    CDN shouldn’t cache.
  • The image is cached without a version or unique fingerprint and is set to expire in 1 day.

The combination of ETag, Cache-Control, and unique URLs allows us to deliver the best of all worlds: long-lived expiry times, control over where the response can be cached, and on-demand updates

0
On

I found two ways to force a cache re-check by the client:

Cache-Control: max-age=0, must-revalidate
Expires: Thu, 01 Jan 1970 00:00:00 GMT

This works with at least Firefox. I would imagine that IE and Chrome will also react properly. It should work with older browsers using HTTP/1.0.

With HTTP 1.1, you can use an ETag. In that case the must-revalidate option is not necessary because having the ETag is enough to make the client react as if must-revalidate was there:

Cache-Control: max-age=0
ETag: 123
Expires: Thu, 01 Jan 1970 00:00:00 GMT

This will tell the client to create a cache version of the data with ETag 123 and recheck the server every time it needs a copy of that data. You can then reply with a 304 Not Modified.

The two options that definitively you cannot use are: no-cache and no-store.

If you want to prevent intermediate caches from caching the data, make sure to add private to the Cache-Control options.

As an interesting feature, you can also use a small max-age of, for example, a few minutes, to let the client cache the data for that amount of time, then send a GET which you can answer with a 304:

Cache-Control: max-age=300
ETag: 123
Expires: Tue, 29 Mar 2015 15:05:00 GMT

In this case, the browser is expected to not check for new data for 5 minutes. After that it sends you the If-None-Match: 123.