Which encoding replaces "í" with "\303 \255"?

4.1k Views Asked by At

Anyone knows which encoding is this one. They tell me this is UTF8 but I can't see how. This input:

aquí (notice the accent on the i)

shoud produce this:

aqu\303 \255

Seems this is based on this table https://www.acc.umu.se/~saasha/charsets/, but I can see how I can get the output suggested from a random user input string from .NET - of course without building this crazy conversion table.

Any ideas?

1

There are 1 best solutions below

0
On BEST ANSWER

It is UTF8, and 303 255 octal is 195 173 decimal, these numbers probably look more familiar. See the dec and oct headers in the table you linked.

There is no built-in type that's going to produce octal output for some characters - you'll have to decide which characters to "octal-escape" and which to keep.

The following snippet produces the output you desired (without the extra space), and escapes data based on whether a character is within the ASCII set:

string str = "aquí";
StringBuilder output = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
    byte[] bytes = Encoding.UTF8.GetBytes(str.Substring(i, 1));
    if (bytes.Length == 1 && bytes[0] < 128)
    {
        output.Append(str[i]);
    }
    else
    {
        foreach (byte b in bytes)
        {
            output.Append(@"\" + Convert.ToString(b, 8));
        }
    }
}

string result = output.ToString();