What to use instead of dontReplace for constructing a data URI?

103 Views Asked by At

The obsolete Uri(string, bool) constructor is used to construct a URI from an already escaped string (obsolete presumably not to break the program if an invalid string is presented). However, I find myself in a situation where I need to pass literal bytes via the URI, and I can't think of a better way to encode them.

I am constructing a data: URI, which is a standardized way to pass the whole resource instead of its identifier. Although I am aware it has a ;base64 specifier to mark the passed data as encoded in base64, there are situations when the URI is shorter without base64, for example when there are less binary data. Because I don't want to worry about encodings, I simply want to pass the bytes together with the URI as an URI-encoded string, using HttpUtility.UrlEncode(byte[]).

Since I am practically left with no other choice than to let .NET encode the string for me, without having to use obsolete constructors, and there is no Uri(byte[]) constructor (there should be, in my opinion), what are my options to construct the URI?

I thought about using Encoding.GetEncoding(1252) to create a string from the bytes and use that, as cp1252 can decode any character, but it seems the internal Uri encoding method uses UTF-8 to encode the characters, so I don't find it possible to use a text encoding at all.

What are my options? Is it okay to continue using the obsolete constructor, if there is no other way?

2

There are 2 best solutions below

0
On BEST ANSWER

Well, the standard Uri constructor accepts pre-encoded URIs, and doesn't replace valid % characters, so using the dontReplace parameter isn't really necessary when constructing Uri from a valid URI string containing encoded parts. They won't get re-encoded.

2
On

there are situations when the URI is shorter without base64, for example when there are less binary data

The URI is shorter without base64 every time, because base64 produces text from a deliberately limited character repertoire from octets.

The time base64 can be not used is when the data is textual. Otherwise the result is going to be gibberish.

as cp1252 can decode any character

No, it can encode only 251 characters unlike say UTF-8 which can encode every character in the UCS. UTF-8 cannot decode every sequence of bytes, while some incorrect CP-1252 implementations fill the gaps (e.g. 0x81) in CP-1252 with something, but even if you could depend on that (you can't) this isn't sensible because you are building a string so the matter of encoding doesn't matter yet, except for any %-escaped characters, and they will always be escaped according to their encoding in UTF-8. (A long time ago URLs [the term URI didn't exist yet] could be escaped according to other encodings, but that didn't work because there was no way to know what encoding had been used, hence the standards mandating UTF-8 since 1998).

Is it okay to continue using the obsolete constructor

No, it produces buggy results.

URIs are built on top of text. If your data is textual then just encode it by normal URI rules through Uri.EscapeDataString(). If your data is not textual then use base-64 to encode it as text, and then go from there. Don't try to put something in a URI that isn't meaningful in a URI.