SAX Parser encoding issue in German Language

1.1k Views Asked by At

I'm working on an app which is in the German language. I'm getting the data in XML form. I used SAX parser for parsing these XMLs and display the data in the TextView. Everything is working fine except the special-characters issue which I got after the parsing.

This is my XML which I got through the URL Link. This XML has utf-8 encoding. All the characters are fine in this XML file.

<?xml version="1.0" encoding="utf-8"?>
<posts>
    <page id="001">
        <title><![CDATA[Sie kaufen bei uns ausschließlich Holzkunst- und Volkskunst-Produkte ]]></title>
        <detial><![CDATA[Durch enge Beziehungen mit unseren Lieferanten können wir attraktive rückläufig 
        Preise und schnelle Lieferungen gewährleisten. Caroline Féry and Laura Herbst Universität Potsdam Mein 
        Flugzeug hatte zwölf Stunden VERSPÄTUNG </p>]]></detial>
    </page>     
</posts>

I used SAX parser for parsing this XML:- (and displaying the parsed data in the TextView.)

public class GermanParseActivity extends Activity {
    /** Called when the activity is first created. */

    static final String URL = "http://www.xyz.com/id=1";

    ItemList itemList;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        XMLParser parser = new XMLParser();
        String XML = parser.getXmlFromUrl(URL);

        System.out.println("This XML is ========>"+XML);

       try
       {
           SAXParserFactory spf = SAXParserFactory.newInstance();
       SAXParser sp = spf.newSAXParser();
           XMLReader xr = sp.getXMLReader();

           /** Create handler to handle XML Tags ( extends DefaultHandler ) */
           MyXMLHandler myXMLHandler = new MyXMLHandler();
           xr.setContentHandler(myXMLHandler);

       ByteArrayInputStream is = new ByteArrayInputStream(XML.getBytes());
       xr.parse(new InputSource(is));
      }
      catch(Exception e)
      {

      }

      itemList = MyXMLHandler.itemList;

      ArrayList<String> listItem= itemList.getTitle();


     ListView lview = (ListView) findViewById(R.id.listview1);
     myAdapter adapter = new myAdapter(this, listItem);
     lview.setAdapter(adapter);
    }


}

but after parsing I'm getting strange characters which are not in XML file but generated after parsing the XML file.

Like these characters:

before parsing after parsing

können ---> können

rückläufig ---> rückläufig

gewährleisten ---> gewährleisten

Can anyone please suggest the proper way to fix this issue?

2

There are 2 best solutions below

3
On BEST ANSWER

You need to reencode your input. The problem is that the text is UTF-8 but is interpreted as ISO-8859-1. That seems to be a bug of SAX.

String output=new String(input.getBytes("8859_1"), "utf-8");

That line takes the ISO-8859-1 and converts it to utf-8 which is used by Java.

2
On

got my anwser from here They suggest that the heading should be:

<?xml version="1.0" encoding="ISO-8859-1"?>

instead of

<?xml version="1.0" encoding="utf-8"?>

Hope that is the answer- edit just saw that you don't have control over the xml, so this will not help, rekire's answer is then a option