Java SpringFramework HTTPRequest unicode character problem

61 Views Asked by At

I'm trying to get the content of an online page through SpringFramework using this procedure

public <T>HttpReply<T> httpRequest(final String uri, final HttpMethod method,
            final Class<T> expectedReturnType, final List<HttpMessageConverter<?>> messageConverters,
            final HashMap<String, Object> formValues, final HashMap<String, Object> headers)
                    throws HttpNullUriOrMethodException, HttpInvocationException {
        try {

            redirectInfo.set(new AbstractMap.SimpleEntry<String, String>(uri, ""));

            if (method==null) {
                throw new HttpNullUriOrMethodException("HttpMethod cannot be null.");
            }

            if (!StringUtils.hasText(uri)) {
                throw new HttpNullUriOrMethodException("URI cannot be null or empty.");
            }

            HttpRequestExecutingMessageHandler handler =
                    buildMessageHandler(uri, method, expectedReturnType, messageConverters);

            // Default queue for reply
            QueueChannel replyChannel = new QueueChannel();
            handler.setOutputChannel(replyChannel);

            // Exec Http Request
            Message<?> message = buildMessage(formValues, headers);
            try {
                handler.handleMessage(message);
            }
            catch (Exception e) {
                throw new HttpInvocationException("Error Handling HTTP Message.");
            }

            // Get Response
            Message<?> response = replyChannel.receive();
            if (response == null) {
                throw new HttpInvocationException("Error: communication is interrupted.");
            }

            // Read response Headers
            String[] usefulHeaders = readUsefulHeaders(response.getHeaders());

            // Return payload
            Object respObj = response.getPayload();             

            if (expectedReturnType != null && !expectedReturnType.isInstance(respObj)) {
                throw new HttpInvocationException("Error: response payload is instance of "
                         + respObj.getClass().getName() + ". Expected: " + expectedReturnType.getClass().getName());
            }

            HttpReply<T> retVal = new HttpReply<>();
            retVal.setPayload((T)respObj);

            String valRedirect = uri;
            if (redirectInfo.get().getKey().equals(uri)) {
                if (StringUtils.hasText(redirectInfo.get().getValue())) {
                    valRedirect = redirectInfo.get().getValue();
                }
            }
            else {
                throw new HttpInvocationException("ERROR READING REDIRECT INFORMATION!!! Original URI: "
                        + uri + " - FOUND URI: " + redirectInfo.get().getKey());
            }
            retVal.setActualLocation(valRedirect);
            return retVal;
        }
        finally {
            redirectInfo.remove();
        }
    }

which gets called like this

HttpReply<byte[]> feedContent = httpUtil.httpRequest(rssFeed.getUrl(), HttpMethod.GET, byte[].class, null,
                null, null);

rawXml = new String(feedContent.getPayload());

Now, this procedure works fine, except that sometimes rawXml contains �, especially when reading from page with a charset different from UTF8.

I tried to put into the handler.setCharset(StandardCharsets.ISO_8859_1), or to change the message header so that it would contain "contentType=application/xml; charset=ISO-8859-1"

I also tried to convert the text once inside rawXml but sometimes the message is neither UTF-8 nor ISO-8859-1 and so the conversion just doesn't correct the missing characters.

0

There are 0 best solutions below