I am using a BinaryReader to read a file and split by new line \n
into ReadOnlySpan<byte>
(to add context I want bytes and not strings as I am using Utf8JsonReader
and trying to avoid copying from string to byte array).
There is a reason for the large buffer it is deliberate - 16kB is OK for the application and is processed one buffer at a time.
However compared to File.ReadAllBytes(filename)
which completes in 1 second, the code below takes 30+ seconds on the same machine.
I am naively assuming BinaryReader
would be reading forward and caching in advance - seems not the case or at least not using any flags for this (I can't seem to fine any).
How can i improve my performance, or implement the line splitting via an alternative class?
static void Main(string[] args)
{
using var fileStream = File.Open(args[0], FileMode.Open);
using (var reader = new BinaryReader(fileStream))
{
var i = 0;
ReadOnlySpan<byte> line = null;
while ((line = reader.ReadLine()) != null)
{
// Process the line here, one at a time.
i++;
}
Console.WriteLine("Read line " + i);
}
}
public static class BinaryReaderExtensions
{
public static ReadOnlySpan<byte> ReadLine(this BinaryReader reader)
{
if (reader.IsEndOfStream())
return null;
// Buffer size is deliberate, we process one line at a time.
var buffer = new byte[16384];
var i = 0;
while (!reader.IsEndOfStream() && i < buffer.Length)
{
if((buffer[i] = reader.ReadByte()) == '\n')
return new ReadOnlySpan<byte>(buffer, 0, i + 1);
i++;
}
return null;
}
public static bool IsEndOfStream(this BinaryReader reader)
{
return reader.BaseStream.Position == reader.BaseStream.Length;
}
}