What does "Accept-Encoding: *" mean?

3.5k Views Asked by At

This page on Mozilla Developer Network, which is usually not too bad in quality, states:

* matches any content encoding not already listed in the header. This is the default value if the header is not present. It doesn't mean that any algorithm is supported; merely that no preference is expressed.

Now I found that Elasticsearch goes ahead and sends gzip when I tell it Accept-Encoding: * but plain data when I leave out the header.

It seems to me that this means that both sentences are wrong:

This is the default value if the header is not present.

In that case the behavior should be identical whether Accept-Encoding: * or no header at all is given.

It doesn't mean that any algorithm is supported; merely that no preference is expressed.

It seems that to Elasticsearch it means exactly that: It's fine to send gzip.

Am I misunderstanding what they mean in MDN? Is the information on that page simply wrong (it has en Edit button after all)? Or is Elasticsearch doing something it's not supposed to do?

1

There are 1 best solutions below

8
On

And what is the wrong behaviour here ?

Edit : the exact expected behaviour is defined in RFC 2616 (obsolete), section 14.3 https://www.rfc-editor.org/rfc/rfc2616#section-14.3 RFC 7231 https://www.rfc-editor.org/rfc/rfc7231#section-5.3.4

My understanding is that if you (the HTTP client) tell Elasticsearch that you can accept any content encoding, then the server is free to choose whatever encoding it prefers to send its data (whether it is plain text or gzip). Then, refer to the Content-Encoding header to be able to handle correctly the data.

Looking precisely at the 2 sentences :

This is the default value if the header is not present.

If the Content-Encoding header is not present, then it is equivalent as stating Content-Encoding = *. Which means that the server can use any content encoding it wishes. It does not mean that the server must always use the same encoding scheme : it means the server is free to choose the one it wants.

It doesn't mean that any algorithm is supported; merely that no preference is expressed.

This sentence applies to the client (not the server). When using *, the client just says to the server "oh, whatever encoding you will use, that's fine by me. Feel free to use any you want."

In both cases (no Accept-Encoding header or Accept-Encoding = *), plain text, gzip or any other encoding scheme is legitimate. As for the Elasticsearch implementation, my guess is the following :

  • As the server, if I receive no Accept-Encoding header I could assume that the client does not even know about content encoding. It is safer to use plain text.
  • As the server, if I receive a Accept-Encoding header, that means the client knows about content encoding and it is really willing to accept anything. Well, gzip is a good choice to spare bandwidth, and it is well supported.

Note that I am largely interpreting : only the answer of the original Elasticsearch developer would be accurate.

If you support a limited set of content encoding, you should not use *. You should better explicitly provide the encodings you support.