Encoding pinyin

397 Views Asked by At

I'm currently developing a program in java, and I want to display Chinese pinyin, which I get from a distant website.

But I have the following problem: Chinese pinyin is displayed this way: jiǎ
Whereas it should be displayed this way: jiǎ
(I just typed the same sequence, except I stripped the slashes).

I think the answer to this question is really simple but I'm struggling to find it.

1

There are 1 best solutions below

0
On BEST ANSWER

The problem is you have an HTML encoded Unicode character and what you want is the decoded version of it. A library like commons-lang3 (part of Apache Commons) will take your HTML encoded string and decode it for Java to display like this:

String decoded = StringEscapeUtils.unescapeHtml("jiǎ");

You can also escape Unicode characters in Java like this:

String jia = "ji\u01ce";

This clever one-liner will take a Unicode character and show you its escaped form:

System.out.println( "\\u" + Integer.toHexString('ǎ' | 0x10000).substring(1) );