Why getting character ®(U+00AE) is different in java 6 and java 7?

726 Views Asked by At

This is my first time asking at StackOverFlow. I'm not good at English. Please excuse me.

I'm having a problem that my application is returning a strange character.

PlayStation\ufffd\ufffd4 Pro

It has to be like this:

PlayStation®4 Pro

I think '\ufffd' character represents this, 'REPLACE CHARACTER'.

My application is using jdk 1.6.

I found that when I change my application's jdk to 1.7, it prints the character correctly.

PlayStation®4 Pro

More Information

My application uses ibatis, and the problem is occurring after queryForObject.

public class A {
    private String content;
    public String getContent() {
        return content;
    }
}
A a = (A)queryForObject("mapper.getSomething", params);
return a;
// jdk1.6 - a.getContent() : PlayStation\ufffd\ufffd4 Pro
// jdk1.7 - a.getContent() : PlayStation®4 Pro

JDBC connection property is like this.

driverClassName=com.mysql.jdbc.Driver
url=jdbc:mysql://{IPADDRESS}/{DBNAME}?Unicode=true&characterEncoding=MS949&zeroDateTimeBehavior=convertToNull&socketTimeout=500000&connectTimeout=500000

More Information 2

  • I tested without ibatis and others. Directly using jdbc connection, but the same result.
public class CharacterEncodeTest {
    // JDBC driver name and database URL
    static final String DB_URL = "jdbc:mysql://{IPADDRESS}/{DBTNAME}}?Unicode=true&characterEncoding=MS949&zeroDateTimeBehavior=convertToNull&socketTimeout=500000&connectTimeout=500000";

    //  Database credentials
    static final String USER = "{USER}";
    static final String PASS = "{PASSWORD}";

    public static void main(String[] args) {
        Connection conn = null;
        Statement stmt = null;
        try {
            //STEP 2: Register JDBC driver
            Class.forName("com.mysql.jdbc.Driver");

            //STEP 3: Open a connection
            System.out.println("Connecting to a selected database...");
            conn = DriverManager.getConnection(DB_URL, USER, PASS);
            System.out.println("Connected database successfully...");

            //STEP 4: Execute a query
            System.out.println("Creating statement...");
            stmt = conn.createStatement();

            String sql = "SELECT * from TABLE";
            ResultSet rs = stmt.executeQuery(sql);
            //STEP 5: Extract data from result set
            while (rs.next()) {
                //Retrieve by column name
                String content = rs.getString("content");

                //Display values
                System.out.print("content: " + content);
                // jdk1.6 : PlayStation\ufffd\ufffd4 Pro
                // jdk1.7 : PlayStation®4 Pro
            }
            rs.close();
        } catch (SQLException se) {
            // something
        } finally {
            // something
        }//end try
    }
}

Question

The only difference is just changing jdk version.

  1. What difference is the matter between jdk 1.6 and 1.7 about this problem?

  2. Is there any solution to solve this problem in jdk 1.6?

4

There are 4 best solutions below

1
SeverityOne On

No idea what \ufffd is, but the ® symbol is \u00ae: https://www.fileformat.info/info/unicode/char/00ae/index.htm

1
Bishal Dubey On

No idea, but i think jdk 1.6 and jdk 1.7 use different types of encoding for character. Please visit the below links :

Does Java 1.7 use a different character encoding?

Why is my String returning "\ufffd\ufffdN a m e"

0
Joop Eggen On

If one sees two replacement chars (� or ?) for one special char, then UTF-8 binary data was converted to a two-byte sequence, every byte > 127, and unconvertable to a char in a single-byte encoding only knowing 256 chars.

So a String (Unicode) was converted to UTF-8 bytes, and then those bytes converted to some single byte encoding.

This could be an URL parameter encoded as UTF-8, received as ISO-8859-1. Or some other meddling. URL. URL decoding/encoding got an encoding parameter. Most likely though there were changes in the environment too. If using ® in the java code, the editor must have the same encoding as the javac compiler, and able to represent the symbol (check by using \u00AE instead).

Search default encoding usages:

  • string.getBytes()
  • new String(bytes)
  • URLDecoder.decode(string)
  • URLEncoder.encode(string)
  • FileReader/FileWriter
  • InputStreamReader(inputStream)
  • OutputStreamWriter(outputStream)

Also zip handling got Unicode support for file names.

Anti-pattern:

  • new String(string.getBytes(...), ...)
1
Mick On

You got two question mark characters initially. This looks like there was one UTF8 character, but your code was not able to read the 4-byte sequence and thus showed 2 question marks - each representing an unknown 2-byte character. Are you sure that the data did not change while your code was never able to process UTF8? It might have been this 4-byte character before: https://en.wikipedia.org/wiki/Enclosed_R ?