I'm using Python's zipfile module to extract .zip files which can contain files with Unicode filenames. WinZip and 7-Zip archives work fine, but WinRAR encodes the filenames a little differently. Say I create a zip file containing a file called "-★-私-", and extract it with this:
with zipfile.ZipFile(zip_file_path, 'r') as zf:
zf.extractall(extract_dir)
This extracts "-★-私-" as "-#U2605-#U79c1-". The ZipInfo object's filename isn't encoded, it's just a regular ASCII string containing the output filename.
I'd like to translate the string, which contains the Unicode code points U-2605 and U-79C1, to a useful, outputtable Unicode string. So I wrote this, but it doesn't convert the characters properly:
string = codePoints.replace('#U', '\\u').encode('utf-8')
Anyway, where have I stepped wrong here? I'm not getting the same result I would get if I did:
string = '-\u2605-\u79c1-'.encode('utf-8')
(Assuming Python 3; in Python 2, I would preface that previous string with a "u" character.)
I am not sure if this is what you are looking for:
For instance:
prints