Filestream and datagridview memory issue with CsvHelper

701 Views Asked by At

TL;DR

  • Reading and modifying flat files within memory before passing to CsvHelper to process as normal (within stream)
  • Process works fine when tested on records ~32k, run multiple times
  • Process works only once when run on 5m+ record, then fails if you try and run it a second time
  • System.OutOfMemoryException error thrown

Linked to this post: CsvHelper - Set the header row and data row

New question since I've come up with a potential solution that deviates from the original post. But am now facings a different issue.

So I amended the test sample data as follows (I added a pipe in row 7):

This is a random line in the file

SOURCE_ID|NAME|START_DATE|END_DATE|VALUE_1|VALUE_2

Another random line in the file

|


GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|DEF
ALEF|ABC|2019-FEB-01|2019-AUG-31|FBC|DGF
GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|TEF
FLBER|RED|2019-JUN-01|2019-JUL-31|AJC|DEH
GI|JOE|2020-APR-01|2020-DEC-31|GBC|DER

I decided to try and manipulate the inbound file in memory and then pass that stream into CsvHelper to process.

I ended up with the below code:

// Using BufferdStream for speed
// https://stackoverflow.com/questions/2161895/reading-large-text-files-with-streams-in-c-sharp
// Read from memory stream
// https://stackoverflow.com/questions/1232443/writing-to-then-reading-from-a-memorystream

int header_row = 3; //row the header is on
int data_row = 10; //row the data starts from

using (FileStream fs = File.Open(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (var stream = new MemoryStream())
using (StreamWriter sw = new StreamWriter(stream))
using (StreamReader sr = new StreamReader(bs))
{
    string line;
    int i = 0;
    while ((line = sr.ReadLine()) != null)
    {
        i++;
        if (i < header_row) // check if the line is less than the header row, if yes ignore
            continue;
        if (i > header_row && i < data_row) // check if the line is between the header row and start of the data, if yes ignore
            continue;
        
        {
            // write to stream if all conditions pass
            sw.WriteLine(line);
            sw.Flush();
        }
        
    }

    sw.Flush();
    stream.Position = 0; //reset position

    // continue using CsvHelper as before, feeding in the 'stream' from memory rather than a file
    using (var reader = new StreamReader(stream))
    using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
    {
        csv.Configuration.Delimiter = "|"; // Set delimiter

        // Load csv to datatable and set dgv source
        using (var dr = new CsvDataReader(csv))
        {
            var dt = new DataTable();
            dt.Load(dr);
            dgvTst04_View.DataSource = dt; // EXECPTION IS THROWN HERE
        }
    }
}

And I get the below result in the datagridview:

Sample file test result

So this works!!

But when I try and implement the same code on a csv file with 5m+ records, it runs once okay (~24s - which is about the same as if I would import it directly into CsvHelper with no other pre-manipulation). But when I try and run it a second time it throws a System.OutOfMemoryException error.

For context, I have 64GB of memory and the process seems to peak at 2GB usage (but it doesn't drop). So I feel like the 'using' is not disposing of the memory/variables correctly? As I had assumed it would come back down after running. Before and after screenshots of diagnostics below:

Before running: Diagnostics before run

After running: Diagnostics after run

Am I not handling the variables correctly in my code or not disposing of them? Although I though that if I use 'using' I shouldn't have to dispose of them manually.

Additional info: I ran the same code on a file with 32k+ rows of data multiple times within the same session (10+), with a similar header/data row structure and it runs in 27 milliseconds on average and there are no 'System.OutOfMemoryException' errors thrown.

Let me know if you would like the 5m record sample file (it's a sample file that I found online on the NZ governments website, so it's public information).

thanks!

0

There are 0 best solutions below