I prepared following request to get a file content from azure devops reop item api. the file content stored in git in UTF-8 format. but the output of rest api is not as expected! how to fix the issue to get content properly as stored in repo?
$uri = "http://devserver/defaultcollection/3e100875-e1dc-4aa4-a9d0-0e97af8a1634/_apis/git/repositories/f26ea979-3786-4bca-965e-0481c07ff9a9/items/Notes%2FREADME.md?versionType=Commit&version=26613c4596f233b0f48ea0f407465d941f0a4144&api-version=7.0"
$contentType = "application/json;charset=utf-8"
$headers = @{ Authorization = "Basic $encodedPAT" }
$fileContent = Invoke-RestMethod -Uri $uri -Headers $headers -ContentType $contentType -Method Get
Output is a Markdown content:
Title|Description|WorkItemID|Software|Area|Type|BuildNumber|Date
-|-|-|-|-|-|-|-
Ø±ÙØ¹ اشکا٠ÙÙØ§ÛØ´ Ø¯Ø§Ø¯Ù ÙØ´Ø¯Ù ÙØ§Ù ÙÙØ§ÛØ´Û ÙØ¯Ø¹ÙÛ٠در ØµÙØÙ ÙØ´Ø§ÙØ¯Ù Ø¬ÙØ³Ù|this is description|409925|Organizer||Bug|20231206.1|2023-12-06

tl;dr
Your
-ContentTypeargument has no effect; to ask the target web service to return a JSON response - assuming it supports it - you'll need to:Use an
Acceptheader field, e.g.Alternatively, if available, in the context of a
GETrequest, use a query-string parameter to that effect as part of the URL.The problem isn't specific to Azure, it is a general problem with PowerShell's web cmdlets: As detailed in the next section, Windows PowerShell and older versions of PowerShell (Core) 7+ mis-decode UTF-8 responses that aren't declared as such in the
Content-Typefield of the response header. This is no longer a problem in PowerShell (Core) 7.4+, which now (consistently) defaults to UTF-8.To ensure decoding as UTF-8, use
Invoke-WebRequestrather thanInvoke-WebRequest; the former's output objects have a.RawContentStreamproperty that returns a raw byte stream that you can decode with the encoding of choice.Applied to your code (as noted, only required in PowerShell versions 7.3.x and below, including in Windows PowerShell):
Note the use of
[System.Text.Encoding]::UTF8to obtain a UTF-8 encoding, and its.GetString()method to convert an array of bytes to a .NET string.Background information:
The
-ContentTypeparameter describes the media type and, optionally, character encoding of the body (data) sent with the request, not what you'd like to receive as a response.Since you're merely performing a
GETrequest without using the-Bodyparameter, the-ContentTypeargument is effectively ignored.While a header field is generally available that signals to the server what response character encoding is desired -
Accept-Charset- it is rarely honored in practice.I presume the same applies if you use a
charsetparameter in the context of also requesting specific media types, via theAcceptheader field.It is therefore the server that decides what character encoding to encode the response with and, crucially, whether or not to explicitly indicate that encoding in the
Content-Typeresponse-header field, e.g.Content-Type: text/markdown; charset=utf-8Strictly speaking, the media type for Markdown text,
text/markdown- assuming that it is used in the server's response - should contain acharsetparameter, which PowerShell's web cmdlets do honor.In the absence of such a
charsetparameter, it is therefore the default character encoding that applies, as used by PowerShell's web cmdlets,Invoke-WebRequestandInvoke-RestMethod.The default character encoding used by the
Invoke-WebRequestandInvoke-RestMethodcmdlets depends on the PowerShell edition and version, as shown in the following table:application/jsonresponses,[2] which default to UTF-8This default encoding not only applies to decoding responses, but also to encoding request data, namely when you pass a string to the
-Bodyparameter (you may alternatively pass arbitrary[byte]arrays); you can override this with acharsetparameter in the-ContentTypeargument, e.g.:-ContentType 'application/json; charset=utf-8'If, in a given call, the response body gets mis-decoded due to the above-mentioned defaults, you need to manually decode the raw bytes, as shown in the top section.
[1] This encoding is largely identical to Windows-1252, except that the following characters are missing, notably including
€:€ ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ[2] Note that request JSON data passed as a string to the
-Bodyparameter is, curiously, still encoded as ISO 8859-1 by default, an inconsistency that was resolved in v7.4.