UnicodeDecodeError using GLib.utf8_collate_key in Windows

65 Views Asked by At

I'm using Python 3.3 / PyGObject 3.14 in Windows 7 and I have the following problem: Using either gi.repository.GLib.utf8_collate_key and gi.repository.GLib.utf8_collate_key with a non-ascii-only string always results in an UnicodeDecodeError.

Test case:

>from gi.repository import GLib
>asciiText = "a"
>unicodeText = "á"

>asciiText.decode()
b'a'

>unicodeText.decode()
b'\xc3\xa1'

>GLib.utf8_collate_key(asciiText, -1)
'Aa'

>GLib.utf8_collate_key(unicodeText, -1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 1: unexpect
ed end of data

Expected result (from Linux)

>GLib.utf8_collate_key(asciiText, -1)
'a'

>GLib.utf8_collate_key(unicodeText, -1)
'á'

The Windows system's locale is set to Portuguese (Brazil).

Does anybody knows how to solve this? I'm considering rolling my own collating function if I can't get this to work.

0

There are 0 best solutions below