Is there much advantage of using Etag in HTTP?

797 Views Asked by At

When I looked at Flask(werkzeug) source codes around etag, I found that it generates a response object, generates etag from the data by sha1, compares it with the if-none-match etag of the request, and returns 304 or 200. So the process of accessing the DB and creating a response is the same regardless of whether there is an etag or not, and the benefit of an etag is just not having to send data to the client.

Of course, if you have a large amount of data, there are advantages, but if the data is not that large, is it considered to be of little use?

Instead of re-creating the etag from the response for each request, I thought it would be better to store the etag in redis or the server memory, etc., when there is a change in the object that is the target of the request, and compare it with the pre-stored etag when the request is made.

Is this way of caching not often used?

1

There are 1 best solutions below

1
On

An ETag is an opaque, server-generated sequence of bytes. The HTTP standard lists a few possible implementations:

For example, a resource that has implementation-specific versioning applied to all changes might use an internal revision number, perhaps combined with a variance identifier for content negotiation, to accurately differentiate between representations. Other implementations might use a collision-resistant hash of representation content, a combination of various file attributes, or a modification timestamp that has sub-second resolution.

So you can see that generating a response and computing its hash is just one of many possible approaches. The reason that Nginx, Flask, Django, and other frameworks take that approach is that it's the only one that doesn't require any application-specific knowledge. It's guaranteed to be accurate no matter how a response is generated.

So, by all means, if you can use application-specific knowledge—like version numbers—to compute an ETag without computing a response, then feel free to do so and enjoy the additional efficiency. In the absence of that, though, the hash method is pretty good: it has low cost, potentially large benefits (for large responses), and requires no extra work from the developer.