Encoding.GetString from an IReadOnlyList<byte>

476 Views Asked by At

Is there a way to get a string out of an IReadOnlyList<byte>, given a specific Encoding?

To be more precise, is there a way that doesn't copy the content of the collection before passing it to the Encoding object?

My main concern is performance, followed by memory usage.

2

There are 2 best solutions below

0
On

We now have someone working on high performance and zero copy parsing of strings and byte sequences.

https://github.com/dotnet/corefxlab/blob/master/docs/specs/parsing.md

6
On

First, you would have to test if you were using a single or dual byte Encoding.

If you are using single byte encoding, you could simply Linq query the byte value directly to a string using Select and Encoding.GetString(byte);

If you are using dual-byte encoding, you could ennumerate two bytes at a time into a buffer. Since you would be re-writing a value type (byte) into an array element, you would only ever use storage for two bytes during the process, although you would be copying each byte out.

I think it would look something like this, but BEWARE: I don't have a compiler on this machine so I cannot verify the syntax (this is C#-ish code :) )

public string example(IReadOnlyList<byte> someListIGotSomewhere, Encoding e)
{
 string retVal = null;
 if(e.IsSingleByte)
 {
     retVal = string.Join("",someListIGotSomewhere.Select(b=>e.GetString(new byte[]{b})));
 }
 else
 {
   StringBuilder sb = new StringBuilder(someListIGotSomewhere.Count()/2);
   var enumerator = someListIGotSomewhere.GetEnumerator();
   var buffer = new byte[2]
   while(enumerator.MoveNext())
   {
     buffer[0] = enumerator.Current;
     buffer[1] = enumerator.MoveNext()?enumerator.Current:0;
     sb.Append(e.GetString(buffer));
   }
   retVal = sb.ToString();
 }
 return retVal;
}