Encoding issue causing difference in strings

226 Views Asked by At

I have a legacy service which returns a XML string from the database. Now for one particular scenario this service returns a string which has the character   in it. I recently shifted this service to a new Windows 10 machine and when I wrote this string to a file, the XML file became un-parseable. On opening the file present on the old Machine in the new machine I saw the file was UTF-8 encoded and on my new machine the file was being written in ANSI. So I started writing the file in UTF-8. The file became parseable now, and was exactly the same as the file on the old machine. But now the issue is that the service is still sending the XML string with the character   in it. But I have started writing the file in UTF-8 and thus the local file has the character "Â " and the String which the service sends has the char xA0. And the logic now compares these two strings and finds a difference, when actually the only difference is in the encoding of these files. Now I am pretty sure that the encoding I want to write the files is in UTF-8, because the files are identical for both machines but how do I convert the String sent by the service such that it is in UTF-8. So that the difference is found only when there is any actual difference. This encoding thing is really confusing for me. Please help me understand what is actually happening here.

Another thing to note here is that the XML file on the old Windows 7 machine shows the encoding ANSI but when I copy that file on my new Windows 10 machine the encoding shows as UTF-8. I check the encoding using notepad(I open the save dialogue). Can someone please help me understand that there was some kind of issue on Windows 7 which was fixed in Windows 10, which is the reason behind the encoding difference between the 2 machines for the same file.

I already asked a question regarding this. I answered my own question as I did solve the parsing issue by writing the file in UTF-8 encoding.

I already tried using below:

byte[] bytes = retVal.getBytes(StandardCharsets.UTF_8);
retVal = new String(bytes, StandardCharsets.UTF_8);

retVal is the string sent by the service. When comparing retVal and the string written to the file, I still get a difference.

This is the code I use to get the string from the service:

        req()
        {
        HttpClient client = new HttpClient() {};
        client.getParams().setParameter("http.useragent", "Service");

            String url = "url";
    
            // Generate Request Body
            String reqBody = generateRequestBody(prarams);
            // Set Appropriate Locale
            PostMethod method = new PostMethod(url);
            method.setRequestBody(reqBody);
    
            String retVal = "";
            // Execute the HTTP Call
            int returnCode = client.executeMethod(method);
    
            if (returnCode == HttpStatus.SC_OK) {
                // Convert response to XML
                DOMParser parser = new DOMParser();
                parser.parse(new InputSource(method.getResponseBodyAsStream()));
                Document doc = parser.getDocument();
                doc.setXmlStandalone(true);
                NodeList nList = doc.getElementsByTagName("tag1");
                Node node = nList.item(0);
    
                // Convert request to String and return
                retVal = nodeToString(node);
    
            }
            return retVal;
          }

    private String nodeToString(Node node){
    StringWriter sw = new StringWriter();

    try {
        Transformer t = TransformerFactory.newInstance().newTransformer();
        t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
        t.transform(new DOMSource(node), new StreamResult(sw));


    } catch (TransformerException te) {
        LOG.info(getStacktraceFromException(te));
        LOG.error("Exception during String to XML transformation ", te);
    }
    return sw.toString();
}

So I tried to fix the encoding at the source, but unfortunately that did not work either. This is my new nodeToString method.

    private String nodeToString(Node node){
        StringWriter sw = new StringWriter();
        String strRepeatString = "";
        try {
            Transformer t = TransformerFactory.newInstance().newTransformer();
            t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
            t.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            StreamResult sr = new StreamResult(new OutputStreamWriter(bos, "UTF-8"));

            t.transform(new DOMSource(node), sr);
            byte[] outputBytes = bos.toByteArray();
            strRepeatString = new String(outputBytes, "UTF-8");

        } catch (TransformerException te) {
            LOG.info(getStacktraceFromException(te));
            LOG.error("Exception during String to XML transformation ", te);
        } catch (UnsupportedEncodingException ex) {
            LOG.info("Error");
        }
          return strRepeatString;
    }

On comparing strRepeatString and the local file saved using UTF-8 encoding(code can be found in the answer of the question ) I am still getting the difference of the char Â.

0

There are 0 best solutions below