Japanese fullwidth character ー is getting garbled when converted to SHIFT_JIS in Java

181 Views Asked by At

In my application, I am reading the data (Japanese text) from DB (UTF-8) and trying to write the output in SHIFT_JIS file format. However, full width ー (817C hex code in shift JIS) is getting converted as ? in the output file.

Here is the sample program to test it

public class ShiftJisTest {

    public static void main(String[] args) {
        String text = "東1-1";
        try (BufferedWriter writer = new BufferedWriter(
                new OutputStreamWriter(new FileOutputStream("output_data"), "SHIFT_JIS"))) {
            writer.write(text);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Output:

東1?1

Hex Code value of output:

93 8C 82 50 3F 82 50

Garbled character in HEX: 3F, expected was 81 7C

1

There are 1 best solutions below

5
g00se On

Looks like that character is not in Shift_Jis:

goose@t410:/tmp$ uniname '\uFF0D'
The name for codepoint \uFF0D is FULLWIDTH HYPHEN-MINUS
The char is -
goose@t410:/tmp$ echo -en '\uFF0D' | iconv -t SHIFT-JIS
iconv: illegal input sequence at position 0