I am using Lumenworks.Framework.IO.Csv.CsvReader
to read Csv files and would like to detect badly formed files. If a row has fewer columns than the header then it throws LumenWorks.Framework.IO.Csv.MissingFieldCsvException
. However, if a row has more columns than the header then it just truncates the row when parsing it. Are there any properties I can set to make it throw? Or another CSV parser that is efficient, easy to use, and will detect this issue?
My test file looks like
Field 1,Field 2,Field 3,Field 4
This,data,looks,ok
But,this,has,too,many,fields
My integration test (NUnit) looks like
[Test, ExpectedException(typeof(MalformedCsvException))]
public void Row_cannot_have_more_fields_than_the_header()
{
using (var stream = File.OpenText("MoreColumnsThanHeader.csv"))
new CsvParser().ReadCsv(stream);
}
and my code to read the data
public DataSubmission ReadCsv(StreamReader streamReader)
{
var data = new DataSubmission();
using (var reader = new CsvReader(streamReader, true))
{
var items = new List<Row>();
var fieldCount = reader.FieldCount; //this is 4 in the test
var headers = reader.GetFieldHeaders();
while (reader.ReadNextRecord()) //reader has a size 4 array for the 6 item row
items.Add(ReadRow(fieldCount, headers, reader));
data.Items = items;
}
return data;
}
private static Row ReadRow(int fieldCount, IList<string> headers, CsvReader reader)
{
var item = new Row();
var fields = new List<Field>();
for (var index = 0; index < fieldCount; index++)
fields.Add(ReadField(headers, reader, index));
item.Fields = fields;
return item;
}
private static Field ReadField(IList<string> headers, CsvReader reader, int index)
{
return new Field {FieldName = headers[index], FieldValue = NullifyEmptyString(reader, index)};
}
private static string NullifyEmptyString(CsvReader reader, int index)
{
return string.IsNullOrWhiteSpace(reader[index]) ? null : reader[index];
}
EDIT: Since creating this question I have changed my CSV parser to use Microsoft.VisualBasic.FileIO.TextFieldParser. It's easy to use, performs well even with large files, and is more robust than the Lumenworks offering. I had issues with the Lumenworks parser when dealing with line breaks in a quoted string. The Microsoft parser handles this well.
Try using the DataTable csv reader (
nuget csvtools
) from Mike Stall.If in any of the
Read
methods inDataTable.New
you setallowMismatch = false
, then it will throw an exception if the number of columns in a given row does not equal the expected number of columns.