Change UTF-8 character to Latin1 Java

4.6k Views Asked by At

In my project I read Strings from my database, where I can not do any change because of permissions. I take a string in any codification, and I change it to UTF-8 without any problem for instance:

String countryName= "ESPAÑA";   //get from de DataBase in unkwon encode
String utf8 = new String(myString.getBytes(), Charset.forName("UTF-8"));
System.out.println(utf8); //prints -> ESPAÑA and it should be ESPAÑA

I need to change it, with all the strings that are parse to UTF-8, and covert it to Latin1

I have found many methods in the page but anything it is doing correctly.

2

There are 2 best solutions below

2
On BEST ANSWER

If you don't know the encoding of the original bytes, you can't transcode them to a known form. I wrote a paper for the Unicode consortium on this problem. see Mapping Text in Unspecified Character Sets to Unicode as a Canonical Representation in a Hostile Environment

This code new String(myString.getBytes(), Charset.forName("UTF-8") means, I have the bytes in UTF-8, convert them into a Java String.

UTF-8 can support the full range of Unicode characters (about 2^21 at the moment). Latin 1 can only support 2^8 characters.

So, transcoding from UTF-8 to Latin-1 is dangerous, as some characters will be lost, and you will need lost character exception handling.

Transcoding from Latin-1 to UTF-8 is fine, as all characters in Latin-1 are supported in UTF-8.

8
On

String#getBytes() returns the text as bytes using the system encoding. What you need is a byte array containing the value of each character without any conversion taking place to keep the UTF8 encoding. You can do that by calling

myString.getBytes("8859_1");

So your line in the code should be changed to

String utf8 = new String(myString.getBytes("8859_1"), "UTF8");

But this is just a workaround. What you should do before is check the access to the database, since the data should come out this way when selecting data from it. As a first test, use a regular DB-client and see if the text shows up correctly in it. If that's the case, the table contains the data correctly and there is something wrong with your data retrieval from the DB. This might be wrong charset settings in the connection-string or you don't use resultset#getString() to get the data as text but maybe you get it as byte-array and create the String in a wrong way.

Try to find the source of this and fix that. Then you don't need hacks like the above to get correct data.