Read huge nvarchar(max) as UTF-16 stream without saturate memory

78 Views Asked by At

I am developing a program that reads text from sql server nvarchar(max) field and needs to write it as UTF-16 text file inside a zip.

I cannot load the string contained in the field as it weighs >1GB and would saturate memory

What I am trying to do is to get a stream directly into a .zip created on the fly

I found a method but I do not get a valid utf-16 file as I read the bytes of the sql field directly. Do you have another approach?

using (SqlDataReader Rd = DbMan.GetDbReader(command, CommandBehavior.SequentialAccess) as SqlDataReader) //Opens a datareader in SequentialAccess fro not impacting on memory
{
    using (MemoryStream MSZip = new MemoryStream()) //This stream contains the .zip created on the fly
    {
        using (var Zip = new ZipArchive(MSZip, ZipArchiveMode.Create)) //Creating .zip
        {
            while (Rd.Read())
            {
                var FileJsonInsideZip = Zip.CreateEntry("MyFile${Rd.GetString(0)}.json", CompressionLevel.Fastest); //Creating the empty file inside .zip

            using (var StreamFileInsideZip = FileJsonInsideZip.Open()) //This stream is the file created inside the .zip
            {
                byte[] buffer = new byte[8096]; //Reading 8096 bytes and adds progressively to zip, to not saturate the memory
                long offset = 0;
                long read;
                while ((read = Rd.GetBytes(1, offset, buffer, 0, buffer.Length)) > 0)
                {
                    offset += read;
                    StreamFileInsideZip.Write(buffer, 0, (int)read);
                }
                
            }

        }

    }
}
}

I also try with TextReader, this approach produce a valid UTF-16 file, but saturate the memory

byte[] sUtf16Bytes = Encoding.Unicode.GetBytes(Rd.GetTextReader(1).ReadToEnd());
StreamFileInsideZip.Write(sUtf16Bytes, 0, sUtf16Bytes.Length); 

EDIT: Sorry, the first code produced a valid file but without BOM. The generated JSON file is imported into sql server, without BOM upon import I was getting the error "SINGLE_NCLOB requires a UNICODE (widechar) input file. The file specified is not Unicode." which misled me

1

There are 1 best solutions below

0
Charlieface On

You should move the buffer outside the loop.

Maybe also initialize the MemoryStream to some large buffer size to avoid copying.

byte[] buffer = new byte[8096]; //Reading 8096 bytes and adds progressively to zip, to not saturate the memory

using SqlDataReader Rd = (SqlDataReader)DbMan.GetDbReader(command, CommandBehavior.SequentialAccess);
using MemoryStream MSZip = new MemoryStream() //This stream contains the .zip created on the fly
using (var Zip = new ZipArchive(MSZip, ZipArchiveMode.Create, leaveOpen: true)) //Creating .zip, leave it open so you can use MSZip after
{
    while (Rd.Read())
    {
        var FileJsonInsideZip = Zip.CreateEntry($"MyFile{Rd.GetString(0)}.json", CompressionLevel.Fastest); //Creating the empty file inside .zip

        using var StreamFileInsideZip = FileJsonInsideZip.Open(); //This stream is the file created inside the .zip
        long offset = 0;
        long read;
        while ((read = Rd.GetBytes(1, offset, buffer, 0, buffer.Length)) > 0)
        {
            offset += read;
            StreamFileInsideZip.Write(buffer, 0, (int)read);
        }
    }
}
MSZip.Position = 0;  // reset position of memorystream to copy somewhere else