Displaying Unicode characters above U+FFFF on Windows

1.3k Views Asked by At

the application I'm developing with EVC++ 4 runs on Windows CE 5 and should support unicode (AFAIK wchar_t uses UTF-16 on windows, so I'm using that), so I want to be able to test it with "more exotic" characters. Especially with characters that use 4 Byte in UTF-16 and not just 2. Therefore I'm trying to display such characters in a texteditor (atm on my desktop PC with Windows XP, not on the embedded device).

But I haven't managed it to do so yet. As an example I've chosen this character. Like mentioned here "MPH 2B Damase" should support this character. So I downloaded the font and put it into Windows\Fonts. I created a textfile using a hexeditor (just to be sure) with following content:

FFFE D802 DC00

When I open it with notepad (which should be unicode-capable, right?) and use the downloaded font it doesn't display 1 char, as intended, but this 2:

˘Ü

What am I doing wrong? :)

Thanks!

hrniels

Edit: Flipping the BOM, as suggested, doesn't work. Notepad (and all other editors I tried, too) displays two squares in this case. Interesting is that if I copy the two squares here (with firefox) I see the right character:


I've also tried it with Komodo Edit with the same result.

Using UTF-8 doesn't help notepad either.

3

There are 3 best solutions below

0
On

Probably you forgot to read the _wfopen() documentation. There they specify the encoding parameter. BTW, I assumed you are already using Unicode (wchars).

I would recommend you to use UTF-8 in files with or without BOM but forcing your fopen to use UTF-8 flag. It looks _wfopen("newfile.txt", "r, ccs=UTF-8"); will work with UTF-8 with or without BOM and also with UTF-16. Do not make the mistake of using the ccs=Unicode, it is a common thing to have UTF-8 files without BOM.

You should really read a little bit about Unicode before trying to work. This about this as a very good investment - it will save you time if you understand how Unicode works.

Here is a start http://blog.i18n.ro/newbie-guide-to-unicode/ and do not forget to read the links from the end of the article.

If you really need a simple text editor that allows you to play with Unicode encodings, use Notepad++ and forget about Notepad.

4
On

Your text editor might not like UTF-16. It probably assumes ANSI or UTF-8.

Try typing in the UTF-8 equivalent instead:

0xF0 0x90 0xA0 0x80

This won't help your testing, but will make sure your font isn't at fault. A text editor that does support UTF-16 is Komodo Edit.

3
On

What happens if you put the byte order mark the other way around?

FEFF D802 DC00

(At the moment the byte sequence is being interpreted as the two characters U+02D8 U+00DC, so hopefully flipping the BOM will cause the bytes to be read in the intended order)