Here's the situation :
- I have an UiPath process containing an Invoke Power Shell activity
- In the Invoke Power Shell activity (set as IsScript), here's my script
python '/path/to/my/script.py'. The ouput is saved as string
When I run my "script.py" file in any Powershell from my computer, the output I get is "Cédric" but when I run the script through UiPath, the output I get is "CÚdric". I understand that the issue is somehow related to the encoding.
After some researchs, I found out that running this Powershell command line [System.Text.Encoding]::Default.EncodingName, I get different results :
- In my system Powershell : "Western Europe (Windows)"
- In UiPath Powershell : "Unicode (UTF-8)"
I found out that the HEX adress of "é" is E9 when using Windows-1252 encoding. But in CP850 encoding, E9 is "Ú". So I guess this is the encoding relation I'm looking for. THOUGH, I tried many things in UiPath (C#) and Powershell commands, but nothing did resolve my problem. (tried both changing encoding values or converting string into bytes to change encoding output)
And to anticipate some questions :
- No, I'll not use "Invoke Python Script" activity in UiPath as it's broken
- Yes, I need to use this Python script
- Yes, I could use a "replace("Ú","é")" on the string output BUT I don't wan't to do it dumbly for every special character that could come, even more when there's a logical reason behind it
TLDR : Basically, the issue is located when UiPath interprets the Powershell console running the Python script
I've been stuck on that for 3 days now, only to get 2% more precise on the project I work (which is completely fine other than that); so it's not worth the time I spend on it, but I need to know
As for
[System.Text.Encoding]::Default: That you're seeing UTF-8 as the value in UiPath implies that it is using PowerShell (Core) 7+ (pwsh.exe), the modern, install-on-demand, cross-platform edition built on .NET 5+, whereas Windows PowerShell (powershell.exe), the legacy, ships-with-Windows, Windows-only edition is built on .NET Framework.PowerShell honors the system's active legacy OEM code page by default when interpreting output from external programs (such as Python scripts),[1] e.g.
850, as reported bychcp, and as reflected in[Console]::OutputEncodingfrom inside PowerShell.That is, PowerShell interprets the byte stream received from external programs as text encoded according to
[Console]::OutputEncoding, and decodes it that way, resulting in a Unicode in-memory string representation, given that PowerShell is built on .NET whose strings are composed of UTF-16 Unicode code units ([char]). If[Console]::OutputEncodingdoesn't match the actual encoding that the external program uses, misinterpreted text can be the result, as in your case.[2]python script.pyresults inCédricprinting to the console, butpython script.py | Write-Output- due to use of a pipeline - involves interpretation by PowerShell, and the encoding mismatch would result inCÚdricA UTF-8 opt-in is available:
Execute the following in PowerShell, before calling the Python script (see this answer for background information):
Python, by contrast, defaults to the system's active legacy ANSI code page (e.g. Windows-1252).[3]
A UTF-8 opt-in is available, either:
By defining environment variable
PYTHONUTF8with value1: Before calling your Python script, execute$env:PYTHONUTF8=1in PowerShell.Or, in Python 3.7+, with explicit
pythonCLI calls, by using the-X utf8option (case matters).Note:
Given the above - assuming that your Python script only ever outputs characters that are part of the Windows-1252 code page - the alternative is to leave Python at its defaults and (temporarily) set the console encoding to Windows-1252 instead of UTF-8:
There is an option to NOT require this configuration, by configuring Windows to use UTF-8 system-wide, as described in this answer, which sets both the active OEM and the active ANSI code page to
65001, i.e. UTF-8.Caveat: This feature - still in beta as of Windows 11 22H2 - has far-reaching consequences:
It causes preexisting, BOM-less files encoded based on the culture-specific ANSI code page (e.g. Windows-1252) to be misinterpreted by default by Windows PowerShell, Python, and generally all non-Unicode Windows applications.
Note that .NET applications, including PowerShell (Core) 7+ (but not Windows PowerShell),[1] - have the inverse problem that they must deal with irrespective of this setting: Because they assume that a BOM-less file is UTF-8-encoded, they must specify the culture-specific legacy ANSI code page explicitly when reading such files.
[1] PowerShell-native commands and scripts, which run in-process, consistently communicate text via in-memory Unicode strings, due to using .NET strings, so no encoding problems can arise.
When it comes to reading files, Windows PowerShell defaults to the ANSI code page when reading source code and text files with
Get-Content, whereas PowerShell (Core) 7+ now - commendably - consistently defaults to UTF-8, also with respect to what encoding is used to write files - see this answer for more information.[2] Specifically, Python outputs byte
0xE9meaning it to be characteré, due to using Windows-1252 encoding. PowerShell, misinterprets this byte as referring to characterÚ, because it decodes the byte as CP850, as reflected in[Console]::OutputEncoding. Compare[Text.Encoding]::GetEncoding(1252).GetString([byte[]] 0xE9)(->é, whose Unicode code point is0xE9too, because Unicode is mostly a superset of Windows-1252) to[Text.Encoding]::GetEncoding(850).GetString([byte[]] 0xE9)(->Ú, whose Unicode code point is0xDA)[3] This applies when its stdout / stderr streams are connected to something other than a console, such when their output is captured by PowerShell.