Wrong encoding when redirecting printed unicode characters on Windows PowerShell

401 Views Asked by At

Using python 3, running the following code

print("some box drawing:")
print("┌─┬┼┴┐")

via

py my_app.py

prints

some box drawing:
┌─┬┼┴┐

As you would expect.

However, if you redirect this (either Windows or Linux) with

py my_app.py > redirected.txt

you get the following exception:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>

As has been suggested in many other posts, this exception can be "fixed" by calling sys.stdout.reconfigure(encoding='utf-8') prior to printing. On linux and in the windows cmd, thats it, problem solved. Using PowerShell on Windows however, the output looks like this:

some box drawing:
ΓöîΓöÇΓö¼Γö╝Γö┤ΓöÉ

Which is especially odd, since it works fine using the cmd.exe console.

The code base is delivered to a customer as an executable and I would like to not ask them to execute something in the console in order for my program to work reliably. Is there a programmatic way to have box drawing characters written correctly when redirecting output to a file using the windows PowerShell?

2

There are 2 best solutions below

1
Neuron On

From this answer, I have learned, that redirecting in the PowerShell to utf-8 simply does not work, but utf-16 does. Executing the following code on startup worked for me with/without redirect and in a number of different consoles:

import os
import sys

is_redirected = not sys.stdout.isatty()
if is_redirected:
    is_power_shell = len(os.getenv('PSModulePath', '').split(os.pathsep)) >= 3
    if is_power_shell:
        sys.stdout.reconfigure(encoding='utf-16')
    else:
        sys.stdout.reconfigure(encoding='utf-8')

I decided to only set the encoding when running a redirect and only to utf-16 when in the PowerShell as I wanted to avoid running into other unforeseen encoding problems with other setups: The snippet that detects the power shell from is taken from this answer and the snippet for detecting a redirect from this answer.

I myself find this solution a little messy. If you find a better solution, I am happy to accept it.

9
Mark Tolonen On

When running Python directly in Windows, it internally uses Unicode APIs to write to the cmd window, and doesn't care what the console encoding is set to, but when redirecting to a file it doesn't know. That's why you can use sys.stdout.reconfigure to tell it.

Python also has an environment variable PYTHONIOENCODING which can tell it the encoding to use as well. chcp is the shell command that will tell you what the terminal expects.

Example:

C:\tmp>chcp
Active code page: 437                     # Legacy U.S. DOS encoding

C:\tmp>py -c "print('┌─┬┼┴┐')"            # this uses Unicode APIs
┌─┬┼┴┐

C:\tmp>py -c "print('┌─┬┼┴┐')" >x         # this uses an OS-specific default encoding
Traceback (most recent call last):        # Windows-1252 on U.S. Windows.
  File "<string>", line 1, in <module>
  File "D:\dev\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>

C:\tmp>set PYTHONIOENCODING=cp437         # code page 437 supports box drawing characters

C:\tmp>py -c "print('┌─┬┼┴┐')" >x         # file is written encoded in cp437

C:\tmp>type x                             # matches terminal encoding and displays correctly
┌─┬┼┴┐

C:\tmp>chcp 65001                         # UTF-8 code page
Active code page: 65001

C:\tmp>type x                             # cp437 doesn't decode properly
������

C:\tmp>set PYTHONIOENCODING=utf8          # Use UTF8

C:\tmp>py -c "print('┌─┬┼┴┐')" >x         # write file encoded in UTF8

C:\tmp>type x                             # matches terminal code page now
┌─┬┼┴┐