Gettext failed to extract non-ASCII characters

497 Views Asked by At

In my source files I have string containing non-ASCII characters like

sCursorFormat = TRANSLATE("Frequency (Hz): %s\nDegree (°): %s");

But when I extract them they vanish like

msgid ""
"Frequency (Hz): %s\n"
"Degree (): %s"
msgstr ""

I have specified the encoding when extracting as

xgettext --from-code=UTF-8

I'm running under MS Windows and the source files are C++ (not that it should matter).

1

There are 1 best solutions below

1
On

The encoding of your source file is probably not UTF-8, but ANSI, which stands for whatever the encoding for non-Unicode applications is (probably code page 1252). If you would open the file in some hex editor you would see byte 0x80 standing for degree symbol. This byte is not a valid UTF-8 character. In UTF-8 encoding degree symbol is represented with two bytes 0xC2 0xB0. This is why the byte vanishes when using --from-code=UTF-8.

The solution for your problem is to use --from-code=windows-1252. OR, better yet, to save all source files as UTF-8, and then use --from-code=UTF-8.