How to Get Unicode data from Azure Devops Git Repository Get Item Rest Api?

126 Views Asked by At

I prepared following request to get a file content from azure devops reop item api. the file content stored in git in UTF-8 format. but the output of rest api is not as expected! how to fix the issue to get content properly as stored in repo?

$uri = "http://devserver/defaultcollection/3e100875-e1dc-4aa4-a9d0-0e97af8a1634/_apis/git/repositories/f26ea979-3786-4bca-965e-0481c07ff9a9/items/Notes%2FREADME.md?versionType=Commit&version=26613c4596f233b0f48ea0f407465d941f0a4144&api-version=7.0"
$contentType  = "application/json;charset=utf-8"
$headers = @{ Authorization = "Basic $encodedPAT" }

$fileContent = Invoke-RestMethod -Uri $uri -Headers $headers -ContentType $contentType -Method Get

Output is a Markdown content:

Title|Description|WorkItemID|Software|Area|Type|BuildNumber|Date
-|-|-|-|-|-|-|-
Ø±ÙØ¹ اشکا٠ÙÙØ§ÛØ´ Ø¯Ø§Ø¯Ù ÙØ´Ø¯Ù ÙØ§Ù ÙÙØ§ÛØ´Û ÙØ¯Ø¹ÙÛ٠در ØµÙØ­Ù ÙØ´Ø§ÙØ¯Ù Ø¬ÙØ³Ù|this is description|409925|Organizer||Bug|20231206.1|2023-12-06
2

There are 2 best solutions below

2
mklement0 On BEST ANSWER

tl;dr

  • Your -ContentType argument has no effect; to ask the target web service to return a JSON response - assuming it supports it - you'll need to:

    • Use an Accept header field, e.g.

       -Headers @{ Accept = 'application/json'; Authorization = "Basic $encodedPAT" }
      
    • Alternatively, if available, in the context of a GET request, use a query-string parameter to that effect as part of the URL.

  • The problem isn't specific to Azure, it is a general problem with PowerShell's web cmdlets: As detailed in the next section, Windows PowerShell and older versions of PowerShell (Core) 7+ mis-decode UTF-8 responses that aren't declared as such in the Content-Type field of the response header. This is no longer a problem in PowerShell (Core) 7.4+, which now (consistently) defaults to UTF-8.

To ensure decoding as UTF-8, use Invoke-WebRequest rather than Invoke-WebRequest; the former's output objects have a .RawContentStream property that returns a raw byte stream that you can decode with the encoding of choice.

Applied to your code (as noted, only required in PowerShell versions 7.3.x and below, including in Windows PowerShell):

$uri = "http://devserver/defaultcollection/3e100875-e1dc-4aa4-a9d0-0e97af8a1634/_apis/git/repositories/f26ea979-3786-4bca-965e-0481c07ff9a9/items/Notes%2FREADME.md?versionType=Commit&version=26613c4596f233b0f48ea0f407465d941f0a4144&api-version=7.0"
$headers = @{ Authorization = "Basic $encodedPAT" }

$fileContent = 
 [System.Text.Encoding]::UTF8.GetString(
   (
     Invoke-WebRequest -Uri $uri -Headers $headers -Method Get
   ).RawContentStream.ToArray()
 )

Note the use of [System.Text.Encoding]::UTF8 to obtain a UTF-8 encoding, and its .GetString() method to convert an array of bytes to a .NET string.


Background information:

  • The -ContentType parameter describes the media type and, optionally, character encoding of the body (data) sent with the request, not what you'd like to receive as a response.

    • Since you're merely performing a GET request without using the -Body parameter, the -ContentType argument is effectively ignored.

    • While a header field is generally available that signals to the server what response character encoding is desired - Accept-Charset - it is rarely honored in practice.
      I presume the same applies if you use a charset parameter in the context of also requesting specific media types, via the Accept header field.

  • It is therefore the server that decides what character encoding to encode the response with and, crucially, whether or not to explicitly indicate that encoding in the Content-Type response-header field, e.g. Content-Type: text/markdown; charset=utf-8

    • Strictly speaking, the media type for Markdown text, text/markdown - assuming that it is used in the server's response - should contain a charset parameter, which PowerShell's web cmdlets do honor.

    • In the absence of such a charset parameter, it is therefore the default character encoding that applies, as used by PowerShell's web cmdlets, Invoke-WebRequest and Invoke-RestMethod.

The default character encoding used by the Invoke-WebRequest and Invoke-RestMethod cmdlets depends on the PowerShell edition and version, as shown in the following table:

Edition Version Default
Windows PowerShell up to 5.1, the latest and last version ISO 88591-1[1]
PowerShell (Core) 7.0 - 7.3.x ISO 88591-1, except for application/json responses,[2] which default to UTF-8
PowerShell (Core) 7.4 and above UTF-8
  • This default encoding not only applies to decoding responses, but also to encoding request data, namely when you pass a string to the -Body parameter (you may alternatively pass arbitrary [byte] arrays); you can override this with a charset parameter in the -ContentType argument, e.g.:
    -ContentType 'application/json; charset=utf-8'

  • If, in a given call, the response body gets mis-decoded due to the above-mentioned defaults, you need to manually decode the raw bytes, as shown in the top section.


[1] This encoding is largely identical to Windows-1252, except that the following characters are missing, notably including :
€ ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ

[2] Note that request JSON data passed as a string to the -Body parameter is, curiously, still encoded as ISO 8859-1 by default, an inconsistency that was resolved in v7.4.

0
Alvin Zhao - MSFT On

I can reproduce the issue when calling the Invoke-RestMethod request via the PowerShell with the default version 5 on Windows Sever 2022. Having installed PowerShell 7, the same request displayed the expected content.

enter image description here

The issue seems not caused by Azure DevOps and irrelevant whether the file content is restored in UTF-8/Markdown content from your Git repo of Azure DevOps. It is caused by the processing UTF-8 in PowerShell.