MBCS to UTF-8: How to encode in Python

7.8k Views Asked by At

I am trying to create a duplicate file finder for Windows. My program works well in Linux. But it writes NUL characters to the log file in Windows. This is due to the MBCS default file system encoding of Windows, while the file system encoding in Linux is UTF-8. How can I convert MBCS to UTF-8 to avoid this error?

2

There are 2 best solutions below

3
On

Tell Python to use UTF-8 on the log file. In Python 3 you do this by:

open(..., encoding='utf-8')

If you want to convert an MBCS string to UTF-8 you can switch string encodings:

filename.encode('mbcs').decode('utf-8')

Use filename.encode(sys.getdefaultencoding())... to make the code work on Linux, as well.

0
On

Just change the encode to 'latin-1' (encoding='latin-1')

Using pure Python: open(..., encoding = 'latin-1')

Using Pandas: pd.read_csv(..., encoding='latin-1')