Java JDK 18 in IntelliJ prints question mark "?" when I tried to print unicode like "\u1699"

2.9k Views Asked by At

tldr: I downgraded to JDK 17 (17.0.2) and now it works...

I was watching a beginners Java tutorial by Kody Simpson on YT (youtube.com/watch?v=t9LP9Nt9Nco), and in that tutorial the boy Kody prints crazy symbols called Unicode like "☯Ωøᚙ", but for me it just prints "?" - a question mark.

char letter = '\u1699';
System.out.println(letter);

I tried pretty much every solution on Stack Overflow, such as:

  • Changing File Encoding to UTF-8, although mine was using UTF-8 by default.
  • Putting '-Dconsole.encoding=UTF-8' and '-Dfile.encoding=UTF-8' in the Edit Custom VM options.
  • Messing with Region Settings in control panel.

None of it worked.

Every post was also from many years ago, such as this one, which is from 12 years:

unicode characters appear as question marks in IntelliJ IDEA console

I ended up deleting and re-downloading Intellij because I thought I messed up some settings and wanted a restart, but this time I made the Project SDK an older version, Oracle openJDK version 14.0.1, and now somehow it worked and printed the 'ᚙ' symbol.

Then I realized the problem might be the latest version of the JDK which is version 18, so I downloaded JDK 17.0.2, and it BOOM it still works and prints out the symbol 'ᚙ', so thats nice :). But when I switched back to JDK version 18 it just prints "?" again.

Also its strange because I can copy paste the ᚙ symbol into the writing code area whatever you call it, (on JDK version 18)

char letter = 'ᚙ';
System.out.println(letter);

But when I press RUN and try to PRINT ... it STILL GIVES QUESTION MARK.

I have no clue why this happens, I started learning coding 2 days so I'm probably dumb, or the new version has got a bug, but I never found a solution through Google or here, so this is why I'm making my first ever Stack Overflow post.

4

There are 4 best solutions below

12
On

TLDR: Use this on Java 18:

-Dfile.encoding="UTF-8" -Dsun.stdout.encoding="UTF-8" -Dsun.stderr.encoding="UTF-8"

From JEP 400:

There are three charset-related system properties used internally by the JDK. They remain unspecified and unsupported, but are documented here for completeness: sun.stdout.encoding and sun.stderr.encoding — the names of the charsets used for the standard output stream (System.out) and standard error stream (System.err), and in the java.io.Console API. sun.jnu.encoding — the name of the charset used by the implementation of java.nio.file when encoding or decoding filename paths, as opposed to file contents. On macOS its value is "UTF-8"; on other platforms it is typically the default charset.

As you can see, those two system properties "remain unspecified and unsupported". But they solved my problem. Therefore, please use them at your own risk, and DO NOT use them in production env. I'm running Eclipse on Windows 10 btw.

I think there must be a good way to set the default charset of JVM upon running, and it is stupid that passing -Dfile.encoding="UTF-8" does not do that. As you can read in JEP 400:

If file.encoding is set to "UTF-8" (i.e., java -Dfile.encoding=UTF-8), then the default charset will be UTF-8. This no-op value is defined in order to preserve the behavior of existing command lines.

And this is exactly what it is "NOT" doing. Passing Dfile.encoding="UTF-8" does "not" preserve the behavior of existing command lines! I think this shows that Java 18's implementation of JEP 400 is not doing what it should actually be doing, which is the root of your problem in the first place.

7
On

I can replicate your problem: printing works correctly when running your code if compiled with JDK 17, and fails when running your code if compiled with JDK 18.

One of the changes implemented in Java 18 was JEP 400: UTF-8 by Default. The summary for that JEP stated:

Specify UTF-8 as the default charset of the standard Java APIs. With this change, APIs that depend upon the default charset will behave consistently across all implementations, operating systems, locales, and configurations.

That sounds good, except one of the goals of that change was (with my emphasis added):

Standardize on UTF-8 throughout the standard Java APIs, except for console I/O.

So I think your problem arose because you had ensured that the console's encoding in Intellij IDEA was UTF-8, but the PrintStream that you were using to write to that console (i.e. System.out) was not.

The Javadoc for PrintStream states (with my emphasis added):

All characters printed by a PrintStream are converted into bytes using the given encoding or charset, or the default charset if not specified.

Since your PrintStream was System.out, you had not specified any "encoding or charset", and were therefore using the "default charset", which was presumably not UTF-8. So to get your code to work on Java 18, you just need to ensure that your PrintStream is encoding with UTF-8. Here's some sample code to show the problem and the solution:

package pkg;

import java.io.FileDescriptor;
import java.io.FileOutputStream;
import java.io.PrintStream;
import java.nio.charset.StandardCharsets;

public class Humpty {

    public static void main(String[] args) throws java.io.UnsupportedEncodingException {

        char letter = 'ᚙ';
        String charset1 = System.out.charset().displayName();  // charset() requires JDK 18

        System.out.println("Writing the character " + letter + " to a PrintStream with charset " + charset1); // fails

        PrintStream ps = new PrintStream(new FileOutputStream(FileDescriptor.out), true, StandardCharsets.UTF_8);
        String charset2 = ps.charset().displayName(); // charset() requires JDK 18
        ps.println("Writing the character " + letter + " to a PrintStream with charset " + charset2); // works
    }
}

This is the output in the console when running that code:

C:\Java\jdk-18\bin\java.exe -javaagent:C:\Users\johndoe\AppData\Local\JetBrains\Toolbox\apps\IDEA-U\ch-0\221.5080.93\lib\idea_rt.jar=64750:C:\Users\johndoe\AppData\Local\JetBrains\Toolbox\apps\IDEA-U\ch-0\221.5080.93\bin -Dfile.encoding=UTF-8 -classpath C:\Users\johndoe\IdeaProjects\HelloIntellij\out\production\HelloIntellij pkg.Humpty
Writing the character ? to a PrintStream with charset windows-1252
Writing the character ᚙ to a PrintStream with charset UTF-8

Process finished with exit code 0

Notes:

  • PrintStream has a new method in Java 18 named charset() which "returns the charset used in this PrintStream instance". The code above calls charset(), and shows that for my machine my "default charset" is windows-1252, not UTF-8.
  • I used Intellij IDEA 2022.1 Beta (Ultimate Edition) for testing.
  • In the console I used font DejaVu Sans to ensure that the character "ᚙ" could be rendered.

UPDATE: To address the issue raised in the comments below by Mostafa Zeinali, the PrintStream used by System.out can be redirected to a UTF-8 PrintStream by calling System.setOut(). Here's sample code:

    String charsetOut = System.out.charset().displayName();
    if (!"UTF-8".equals(charsetOut)) {
        System.out.println("The charset for System.out is " + charsetOut + ". Changing System.out to use charset UTF-8");
        System.setOut(new PrintStream(new FileOutputStream(FileDescriptor.out), true, StandardCharsets.UTF_8));
        System.out.println("The charset for System.out is now " +    System.out.charset().displayName());
    }

This is the output from that code on my Windows 10 machine:

The charset for System.out is windows-1252. Changing System.out to use charset UTF-8
The charset for System.out is now UTF-8

Note that System.out is a final variable, so you can't directly assign a new PrintStream to it. This code fails to compile with the error "Cannot assign a value to final variable 'out'":

System.out = new PrintStream(new FileOutputStream(FileDescriptor.out), true, StandardCharsets.UTF_8); // Won't compile
2
On

Update IntelliJ IDEA to version 2022.2.1+. Very similar problem was classified as a bug. You can find more details here.

0
On

Had such trouble as well. Changing setting (File > Settings... > Editor > General > Console) into UTF-32 helped to solve this issue.