I'm trying to get the content of an online page through SpringFramework using this procedure
public <T>HttpReply<T> httpRequest(final String uri, final HttpMethod method,
final Class<T> expectedReturnType, final List<HttpMessageConverter<?>> messageConverters,
final HashMap<String, Object> formValues, final HashMap<String, Object> headers)
throws HttpNullUriOrMethodException, HttpInvocationException {
try {
redirectInfo.set(new AbstractMap.SimpleEntry<String, String>(uri, ""));
if (method==null) {
throw new HttpNullUriOrMethodException("HttpMethod cannot be null.");
}
if (!StringUtils.hasText(uri)) {
throw new HttpNullUriOrMethodException("URI cannot be null or empty.");
}
HttpRequestExecutingMessageHandler handler =
buildMessageHandler(uri, method, expectedReturnType, messageConverters);
// Default queue for reply
QueueChannel replyChannel = new QueueChannel();
handler.setOutputChannel(replyChannel);
// Exec Http Request
Message<?> message = buildMessage(formValues, headers);
try {
handler.handleMessage(message);
}
catch (Exception e) {
throw new HttpInvocationException("Error Handling HTTP Message.");
}
// Get Response
Message<?> response = replyChannel.receive();
if (response == null) {
throw new HttpInvocationException("Error: communication is interrupted.");
}
// Read response Headers
String[] usefulHeaders = readUsefulHeaders(response.getHeaders());
// Return payload
Object respObj = response.getPayload();
if (expectedReturnType != null && !expectedReturnType.isInstance(respObj)) {
throw new HttpInvocationException("Error: response payload is instance of "
+ respObj.getClass().getName() + ". Expected: " + expectedReturnType.getClass().getName());
}
HttpReply<T> retVal = new HttpReply<>();
retVal.setPayload((T)respObj);
String valRedirect = uri;
if (redirectInfo.get().getKey().equals(uri)) {
if (StringUtils.hasText(redirectInfo.get().getValue())) {
valRedirect = redirectInfo.get().getValue();
}
}
else {
throw new HttpInvocationException("ERROR READING REDIRECT INFORMATION!!! Original URI: "
+ uri + " - FOUND URI: " + redirectInfo.get().getKey());
}
retVal.setActualLocation(valRedirect);
return retVal;
}
finally {
redirectInfo.remove();
}
}
which gets called like this
HttpReply<byte[]> feedContent = httpUtil.httpRequest(rssFeed.getUrl(), HttpMethod.GET, byte[].class, null,
null, null);
rawXml = new String(feedContent.getPayload());
Now, this procedure works fine, except that sometimes rawXml contains �, especially when reading from page with a charset different from UTF8.
I tried to put into the handler.setCharset(StandardCharsets.ISO_8859_1), or to change the message header so that it would contain "contentType=application/xml; charset=ISO-8859-1"
I also tried to convert the text once inside rawXml but sometimes the message is neither UTF-8 nor ISO-8859-1 and so the conversion just doesn't correct the missing characters.