Confusion about java native2ascii tool

1.6k Views Asked by At

A little confusion about the java native2ascii tool. Definition for the tool in Java 6:

Converts a file with native-encoded characters (characters which are non-Latin 1 and non-Unicode) to one with Unicode-encoded characters.

Then why does it also transform characters belonging to Latin 1 table (such as é) to unicode encoded representation (\u00e9) ???

Latin 1 (iso 8859-1) table is available here for instance http://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout

That implies that i cannot directly work with properties files for some european languages such as french.

To clarify my question:

native2ascii shouldnt convert latin1 characters (as per its description). é is a valid latin1 character. Therefore why is it converted ?

2

There are 2 best solutions below

4
On

You can work with properties files with french and other characters. Properties accepts \uxxxx sequences. You can work with national characters directly since Properties has load(Reader reader) method. Then the file can be in any encoding, you will provide the reader that decodes the file correctly, eg new InputStreamReader(new FileInputStream(1.properities), Charset.forName("ISO-8859-1"));

I also agree that native2ascii should not convert é because it's a legal latin-1 char and docs says latin-1 chars are not converted.

0
On

The source of confusion might be that the documentation changed with Java version 7.

In Java 6 the documentation for solaris and unix ( http://docs.oracle.com/javase/6/docs/technotes/tools/solaris/native2ascii.html ) says: "The Java compiler and other Java tools can only process files which contain Latin-1 and/or Unicode-encoded (\udddd notation) characters. native2ascii converts files which contain other character encodings into files containing Latin-1 and/or Unicode-encoded charaters."

I think it clearly means that the output is Latin-1, and characters not in Latin-1 will be Unicode-encoded in the output.

I checked Openjdk 6 on Ubuntu and the native2ascii there does not conform to the documentation, it outputs Latin-1 characters as Unicode-encoded. So either the documentation or the native2ascii tool can be considered incorrect in that case.

However in Java 7 and Java 8 the documentation ( http://docs.oracle.com/javase/7/docs/technotes/tools/solaris/native2ascii.html https://docs.oracle.com/javase/8/docs/technotes/tools/unix/native2ascii.html ) says: "native2ascii converts files that are encoded to any character encoding that is supported by the Java runtime environment to files encoded in ASCII, using Unicode escapes ("\uxxxx" notation) for all characters that are not part of the ASCII character set."

I checked Openjdk 8 native2ascii on Ubuntu and found that it works accordingly, it converts Latin-1 characters to Unicode-encoded.

Note that the 7/8 documentation mentions also "This process is required for properties files containing characters not in ISO-8859-1 character sets."

I think it clearly means that properties files containing Latin-1 (aka ISO-8859-1) encoded characters are still valid.