Perldoc using wrong encoding for copyright symbol

285 Views Asked by At

I've noticed that Pod::Usage pod2man, and even pod2markdown are doing the wrong encoding in their output for certain characters. These programs are encoding the copyright symbol as a single byte 0xA9 which is its Unicode Code Point as well as its iso-8859-1 and cp1252 encodings, not its utf-8 encoding which should be the multibyte 0xCA:0xA9.

The issue has to do with Pod::Escapes which I've updated to version 1.07 (the latest version) and utf8::unicode_to_native (which I can't find).

Looking at Pod::Escape, the %Name2character_number hash sets the key copy to the unicode character point 0xA9 (169) which is correct.

However, the %Name2character hash is getting the wrong representation from the utf8::unicode_to_native subroutine. In fact, all of the Unicode character codes 0x80 to 0xFF are being set as their single byte representation and not as the utf-8 encoding. All characters above 0xFF are being set correctly.

Is there a way to fix this issue? I am running Perl 5.18.2 on Mac OS X 10.10 (Yosemite) which is natively utf-8.

0

There are 0 best solutions below