I'm working on an app which is in the German language. I'm getting the data in XML form. I used SAX parser for parsing these XMLs and display the data in the TextView. Everything is working fine except the special-characters issue which I got after the parsing.
This is my XML which I got through the URL Link. This XML has utf-8 encoding. All the characters are fine in this XML file.
<?xml version="1.0" encoding="utf-8"?>
<posts>
<page id="001">
<title><![CDATA[Sie kaufen bei uns ausschließlich Holzkunst- und Volkskunst-Produkte ]]></title>
<detial><![CDATA[Durch enge Beziehungen mit unseren Lieferanten können wir attraktive rückläufig
Preise und schnelle Lieferungen gewährleisten. Caroline Féry and Laura Herbst Universität Potsdam Mein
Flugzeug hatte zwölf Stunden VERSPÄTUNG </p>]]></detial>
</page>
</posts>
I used SAX parser for parsing this XML:- (and displaying the parsed data in the TextView
.)
public class GermanParseActivity extends Activity {
/** Called when the activity is first created. */
static final String URL = "http://www.xyz.com/id=1";
ItemList itemList;
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
XMLParser parser = new XMLParser();
String XML = parser.getXmlFromUrl(URL);
System.out.println("This XML is ========>"+XML);
try
{
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
/** Create handler to handle XML Tags ( extends DefaultHandler ) */
MyXMLHandler myXMLHandler = new MyXMLHandler();
xr.setContentHandler(myXMLHandler);
ByteArrayInputStream is = new ByteArrayInputStream(XML.getBytes());
xr.parse(new InputSource(is));
}
catch(Exception e)
{
}
itemList = MyXMLHandler.itemList;
ArrayList<String> listItem= itemList.getTitle();
ListView lview = (ListView) findViewById(R.id.listview1);
myAdapter adapter = new myAdapter(this, listItem);
lview.setAdapter(adapter);
}
}
but after parsing I'm getting strange characters which are not in XML file but generated after parsing the XML file.
Like these characters:
before parsing after parsing
können ---> können
rückläufig ---> rückläufig
gewährleisten ---> gewährleisten
Can anyone please suggest the proper way to fix this issue?
You need to reencode your input. The problem is that the text is UTF-8 but is interpreted as ISO-8859-1. That seems to be a bug of SAX.
That line takes the ISO-8859-1 and converts it to utf-8 which is used by Java.