Casting each char from span vs MemoryMarshal.Cast<byte, char>

34 Views Asked by At

When debugging an operation with UTF8 strings, sometimes I want to see the string representation from a given ReadOnlySpan<byte> so i created a static function to help me achieve it, but, one of the ways to do so doesn't worked as spected, i wonder why does the outcoming string is incomprehensible.

//#define FORCE_NOT_UTF8

using MemoryMarshal = System.Runtime.InteropServices.MemoryMarshal;
using Unsafe = System.Runtime.CompilerServices.Unsafe;
using Encoding = System.Text.Encoding;

static string ForgeString(ReadOnlySpan<byte> utf8Runes)
{
    Span<char> buffer = utf8Runes.Length > 1024
        ? new char[utf8Runes.Length]
        : stackalloc char[1024]
    ;
#if FORCE_NOT_UTF8
    Encoding.UTF8.GetChars(utf8Runes, buffer);
#else
    if (Encoding.Default.BodyName != Encoding.UTF8.BodyName)
    {
        Encoding.UTF8.GetChars(utf8Runes, buffer);
    }
    else if(buffer.Length is <= 1024)
    {
        MemoryMarshal.Cast<byte, char>(utf8Runes).CopyTo(buffer);
    }
    else
    {
        ref readonly var elmnt0 = ref utf8Runes[0];
        ref var ptrSrc = ref Unsafe.AsRef(in elmnt0);
        ref var ptrDst = ref buffer[0];

        for(int i = 0; ptrSrc is not default(byte) && i < utf8Runes.Length; i++)
        {
            ptrDst = (char) ptrSrc;
            ptrSrc = ref Unsafe.Add(ref ptrSrc, 1);
            ptrDst = ref Unsafe.Add(ref ptrDst, 1);
        }
    }
#endif

    Index end = buffer.IndexOf(default(char)) is int index and not -1 ? new(index) : Index.End;

    return new(buffer[..end]);
}

string result1 = default!;
string result2 = default!;

result1 = ForgeString("foobar"u8);
result2 = ForgeString("james james james (...repeating 166 times)"u8);

Console.WriteLine(result1);
Console.WriteLine(result2);

//in order to get string result3 its necessary to recompile with compiler symbol FORCE_NOT_UTF8

The for loop prints normally, 'James' a bunch of times but, using marshal casting, 'foobar' produces '潦扯牡.' What's happing behind Cast<TFrom,TTo> to create this unexpected sequence? I thought the idea of it was literally (T)eing each element of a given span.

0

There are 0 best solutions below