How to do file versioning with CDN and a loadbalancer?

2k Views Asked by At

So I'm using a very simple CDN service. You point to your website and if you call it through their HostName they'll cache it for you after the first call.

I use this for all my static content, like JavaScript files and images.

This all works perfect - and I like that it has very little maintenance or setup cost.

Problem starts when rolling out new versions of JavaScript files. New JavaScript files automatically get a new hash if the files changes.

Because roll out over multiple instances is not simultaneously a problem occurs though. I tried to model it in this diagram:

Diagram

In words:

  • Request hits server with new version
  • Requests Js file with new version hash
  • CDN detects correctly that the file is not cached
  • CDN requests the original file with the new hash from the load balancer
  • loadbalancer serves request of CDN to a random server - accidently serving from a server with the old version
  • CDN caches old version with the new hash
  • everyone gets served old versions from the CDN

There are some ways I know how to fix this - i.e. manually uploading files to a seperate storage with the hash baked in, etc. But this needs extra code and has more "moving parts" that makes maintenance more complicated.

I would prefer to have something that works as seamlessly as the normal CDN behavior. I guess this is a common problem for sites that are running on multiple instances, but I can't find a lot of information about this.

What is the common way to solve this?

Edit

I think another solution would be to somehow force the CDN to go to the same instance for the .js file as the original html file - but how?

3

There are 3 best solutions below

0
On BEST ANSWER

I fixed this in the end by only referencing to the CDN version after a few minutes of runtime.

So if the runtime is less then 5 minutes it refers to:

/scripts/example.js?v=351

After 5 minutes it refers to the CDN version:

https://cdn.example.com/scripts/example.js?v=351

After 5 minutes we are pretty sure that all instances are running the new version, so that we don't accidently cache an old version with the new hash.

The downside is that on very busy moments you don't have the advantage of the CDN if you would redeploy, but I haven't seen a better alternative yet.

0
On

Here are a few ideas from my solutions in the past, though the CDN you are using will rule out some of these:

  1. Exclude .js files from the CDN Caching Service, prevents it being cached in the first place.
  2. Poke the CDN with a request to invalidate the cache for a specific file at the time of release.
  3. In your build/deploy script, change the name of the .js file and reference the new file in your HTML.
  4. Use query parameters after the .js file name, which are ignored but cached under a different address reference, e.g. /mysite/myscript.js?build1234
3
On

The problem with this kind of issues is that the cache control resides on the browser side, so you cannot do too much form the server side.

The most common way I know is basically the one you mention about adding some hash to the file names or the URLs you use to get them.

The thing is that you should not do this manually. You should use some web application builder, like Webpack, to automate this process and it will depend on the technologies you are using. I saw this for the first time using GWT 13 years ago, and all the last projects I worked with, using AngularJS or React, had been integrated with builders that does what you need automatically.

Once it's implemented, your users will get the last version, and resources will be cached correctly to speed up your site.

If you can also automate the full pipeline to remove the old resources from the CDN once the expiration configured on them have been reached, you touched the sky.