I am using LumenWorks awesome CSV reader to process CSV files. Some files have over 1 million records.
What I want is to process the file in sections. E.g. I want to process 100,000 records first, validate the data and then send this records over an Internet connection. Once sent, I then reopen the file and continue from record 100,001. On and on till I finish processing the file. In my application I have already created the logic of keeping track of which record I am currently processing.
Does the LumenWorks parser support processing from a predetermined line in the CSV or it always has to start from the top? I see it has a buffer variable. Is there a way to use this buffer variable to achieve my goal?
my_csv = New CsvReader(New StreamReader(file_path), False, ",", buffer_variable)
It seems the
LumenWorks CSV Readerneeds to start at the top - I needed to ignore the first n lines in a file, and attempted to pass aStreamReaderthat was at the correct position/row, but got aKey already existsDictionaryerror when I attempted to get theFieldCount(there were no duplicates).However, I have found some success by first reading pre-trimmed file into
StringBuilderand then into aStringReaderto allow the CSV Reader to read it. Your mileage may vary with huge files, but it does help to trim a file:You might be able to adapt this solution to reading chunks of a file - e.g. as you read through the
StreamReader, assign different "chunks" to aCollectionofStringBuilderobjects and also pre-pend the header row if you want it.