I'm using org.apache.commons.httpclient.HttpClient
and need to setup response encoding (for some reason server returns incorrect encoding in Content-Type). My way is to get response as raw bytes and convert to String
with desired encoding. I'm wondering if there is some better way to do this (eg. setup HttpClient). Thanks for suggestions.
Set response encoding with HttpClient 3.1
15.7k Views Asked by michal.kreuzman AtThere are 4 best solutions below

A few notes:
Server serves data, so it's up to server to serve it in an appropriate format. So response encoding is set by server not client. However, client could suggest to server what format it would like via Accept and Accept-Charset:
Accept: text/plain Accept-Charset: utf-8
However, http servers usually do not convert between formats.
If option 1. does not work, then you should look at the configuration of the server.
When String is sent as raw bytes (and it always is, because this is what networks transmit), there is always the encoding defined. Since server produces this raw bytes, it defines the encoding. So, you can not take raw bytes and use encoding of your choice to create a String. You must use encoding that was used when converted from String to bytes.

Disclaimer: I'm not really knowing HttpClient, only reading the API.
I would use the execute method returning a HttpResponse, then .getEntity().getContent()
. This is a pure byte stream, so if you want to ignore the encoding told by the server, you can simply wrap your own InputStreamReader around it.
Okay, looks like I had the wrong version (obviously, there are too much HttpClient
classes out there).
But same as before, just located on other classes: the HttpMethod
has a getResponseBodyAsStream()
method, around which you can now wrap your own InputStreamReader. (Or get the whole array at once, if it is not too big, and convert it to String, as you wrote.)
I think trying to change the response and letting the HttpClient analyze it is not the right way here.
I suggest sending a message to the server administrator/webmaster about the wrong charset, though.
I don't think there's a better answer using
HttpClient
3.x APIs.The HTTP 1.1 spec says clearly that a client "must" respect the character set specified in the response header, and use ISO-8859-1 if no character set is specified. The
HttpClient
APIs are designed on the assumption that the programmer wants to conform to the HTTP specs. Obviously, you need to break the rules in the spec so that you can talk to the non-compliant server. Not withstanding, this is not a use-case that the API designers saw a need to support explicitly.If you were using the
HttpClient
4.x, you could write your ownResponseHandler
to convert the body into anHttpEntity
, ignoring the response message's notional character set.